Cassandra: The Definitive Guide: Distributed Data at Web Scale [3 ed.] 1098115163, 9781098115166

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you'll learn how the Cass

5,563 1,320 19MB

English Pages 428 [697] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Cassandra: The Definitive Guide: Distributed Data at Web Scale [3 ed.]
 1098115163, 9781098115166

Table of contents :
Foreword
Preface
Why Apache Cassandra?
Is This Book for You?
What’s in This Book?
New for the Third Edition
Conventions Used in This Book
Using Code Examples
O’Reilly Interactive Katacoda Scenarios
O’Reilly Online Learning
How to Contact Us
Acknowledgments
1. Beyond Relational Databases
What’s Wrong with Relational Databases?
A Quick Review of Relational Databases
Transactions, ACID-ity, and Two-Phase Commit
Schema
Sharding and Shared-Nothing Architecture
Web Scale
The Rise of NoSQL
Summary
2. Introducing Cassandra
The Cassandra Elevator Pitch
Cassandra in 50 Words or Less
Distributed and Decentralized
Elastic Scalability
High Availability and Fault Tolerance
Tuneable Consistency
Brewer’s CAP Theorem
Row-Oriented
High Performance
Where Did Cassandra Come From?
Is Cassandra a Good Fit for My Project?
Large Deployments
Lots of Writes, Statistics, and Analysis
Geographical Distribution
Hybrid Cloud and Multicloud Deployment
Getting Involved
Summary
3. Installing Cassandra
Installing the Apache Distribution
Extracting the Download
What’s in There?
Building from Source
Additional Build Targets
Running Cassandra
Setting the Environment
Starting the Server
Stopping Cassandra
Other Cassandra Distributions
Running the CQL Shell
Basic cqlsh Commands
cqlsh Help
Describing the Environment in cqlsh
Creating a Keyspace and Table in cqlsh
Writing and Reading Data in cqlsh
Running Cassandra in Docker
Summary
4. The Cassandra Query Language
The Relational Data Model
Cassandra’s Data Model
Clusters
Keyspaces
Tables
Columns
CQL Types
Numeric Data Types
Textual Data Types
Time and Identity Data Types
Other Simple Data Types
Collections
Tuples
User-Defined Types
Summary
5. Data Modeling
Conceptual Data Modeling
RDBMS Design
Design Differences Between RDBMS and Cassandra
Defining Application Queries
Logical Data Modeling
Hotel Logical Data Model
Reservation Logical Data Model
Physical Data Modeling
Hotel Physical Data Model
Reservation Physical Data Model
Evaluating and Refining
Calculating Partition Size
Calculating Size on Disk
Breaking Up Large Partitions
Defining Database Schema
Cassandra Data Modeling Tools
Summary
6. The Cassandra Architecture
Data Centers and Racks
Gossip and Failure Detection
Snitches
Rings and Tokens
Virtual Nodes
Partitioners
Replication Strategies
Consistency Levels
Queries and Coordinator Nodes
Hinted Handoff
Anti-Entropy, Repair, and Merkle Trees
Lightweight Transactions and Paxos
Memtables, SSTables, and Commit Logs
Bloom Filters
Caching
Compaction
Deletion and Tombstones
Managers and Services
Cassandra Daemon
Storage Engine
Storage Service
Storage Proxy
Messaging Service
Stream Manager
CQL Native Transport Server
System Keyspaces
Summary
7. Designing Applications with Cassandra
Hotel Application Design
Cassandra and Microservice Architecture
Microservice Architecture for a Hotel Application
Identifying Bounded Contexts
Identifying Services
Designing Microservice Persistence
Extending Designs
Secondary Indexes
Materialized Views
Reservation Service: A Sample Microservice
Design Choices for a Java Microservice
Deployment and Integration Considerations
Services, Keyspaces, and Clusters
Data Centers and Load Balancing
Interactions Between Microservices
Summary
8. Application Development with Drivers
DataStax Java Driver
Development Environment Configuration
Connecting to a Cluster
Statements
Simple Statements
Prepared Statements
Query Builder
Object Mapper
Asynchronous Execution
Driver Configuration
Metadata
Debugging and Monitoring
Other Cassandra Drivers
Summary
9. Writing and Reading Data
Writing
Write Consistency Levels
The Cassandra Write Path
Writing Files to Disk
Lightweight Transactions
Batches
Reading
Read Consistency Levels
The Cassandra Read Path
Read Repair
Range Queries, Ordering and Filtering
Paging
Deleting
Summary
10. Configuring and Deploying Cassandra
Cassandra Cluster Manager
Creating a Cluster
Adding Nodes to a Cluster
Dynamic Ring Participation
Node Configuration
Seed Nodes
Snitches
Partitioners
Tokens and Virtual Nodes
Network Interfaces
Data Storage
Startup and JVM Settings
Planning a Cluster Deployment
Cluster Topology and Replication Strategies
Sizing Your Cluster
Selecting Instances
Storage
Network
Cloud Deployment
Amazon Web Services
Google Cloud Platform
Microsoft Azure
Summary
11. Monitoring
Monitoring Cassandra with JMX
Cassandra’s MBeans
Database MBeans
Cluster-Related MBeans
Internal MBeans
Monitoring with nodetool
Getting Cluster Information
Getting Statistics
Virtual Tables
System Virtual Schema
System Views
Metrics
Logging
Examining Log Files
Full Query Logging
Summary
12. Maintenance
Health Check
Common Maintenance Tasks
Flush
Cleanup
Repair
Rebuilding Indexes
Moving Tokens
Adding Nodes
Adding Nodes to an Existing Data Center
Adding a Data Center to a Cluster
Handling Node Failure
Repairing Failed Nodes
Replacing Nodes
Removing Nodes
Upgrading Cassandra
Backup and Recovery
Taking a Snapshot
Clearing a Snapshot
Enabling Incremental Backup
Restoring from Snapshot
SSTable Utilities
Maintenance Tools
Netflix Priam
DataStax OpsCenter
Cassandra Sidecars
Cassandra Kubernetes Operators
Summary
13. Performance Tuning
Managing Performance
Setting Performance Goals
Benchmarking and Stress Testing
Monitoring Performance
Analyzing Performance Issues
Tracing
Tuning Methodology
Caching
Key Cache
Row Cache
Chunk Cache
Counter Cache
Saved Cache Settings
Memtables
Commit Logs
SSTables
Hinted Handoff
Compaction
Concurrency and Threading
Networking and Timeouts
JVM Settings
Memory
Garbage Collection
Summary
14. Security
Authentication and Authorization
Password Authenticator
Using CassandraAuthorizer
Role-Based Access Control
Encryption
SSL, TLS, and Certificates
Node-to-Node Encryption
Client-to-Node Encryption
JMX Security
Securing JMX Access
Security MBeans
Audit Logging
Summary
15. Migrating and Integrating
Knowing When to Migrate
Adapting the Data Model
Translating Entities
Translating Relationships
Adapting the Application
Refactoring Data Access
Maintaining Consistency
Migrating Stored Procedures
Planning the Deployment
Migrating Data
Zero-Downtime Migration
Bulk Loading
Common Integrations
Managing Data Flow with Apache Kafka
Searching with Apache Lucene, SOLR, and Elasticsearch
Analyzing Data with Apache Spark
Summary
Index

Polecaj historie