Understanding Database Choices
Key Factors for Database Choice
- Structure of Data: Whether the data is structured or unstructured.
- Query Patterns: Type of queries performed, such as random reads, bulk reads, or sequential writes.
- Scale Requirements: The scale of data and query loads the system needs to handle.
1. Caching Solutions
- Purpose: Reduce latency and prevent frequent database or remote service calls.
- Common Use Cases:
- Storing database query results.
- Caching responses from remote services with high latency.
- How it Works: Key-Value storage where the key represents the query/request and the value is the response.
- Popular Choices:
- Redis: Battle-tested, widely used, and stable.
- Memcached: Lightweight and fast.
- etcd and Hazelcast: Alternative modern options.
2. File Storage Options
- Purpose: Store large binary files like images, videos, and documents.
- Common Use Cases:
- E-commerce product images/videos.
- Streaming services like Netflix storing videos.
- Solution: Blob Storage (e.g., Amazon S3).
- Key Considerations:
- Not a database; no complex queries on stored files.
- Use with a CDN to distribute content geographically for lower latency.
3. Text Search
- Purpose: Provide efficient text-based search capabilities.
- Common Use Cases:
- E-commerce platforms: Search by product title/description.
- Streaming platforms: Search by movie name, genres, or cast.
- Location services: Search by address or names with fuzzy search.
- Solution:
- ElasticSearch: Distributed search engine with full-text search.
- Apache Solr: Similar search engine with Lucene as the core.
- Fuzzy Search: Handles minor spelling mistakes using edit distance (e.g., "AIRPROT" -> "AIRPORT").
- Key Consideration:
- Not a primary database; no guarantees of durability.
- Primary data should reside in a different reliable data store.
4. Metrics Storage
- Purpose: Store and analyze application metrics like latency, CPU usage, etc.
- Characteristics:
- Append-only sequential writes.
- Bulk reads for specific time ranges (e.g., last few hours).
- Solution: Time Series Databases.
- InfluxDB
- OpenTSDB (Open Time Series Database)
5. Analytics Storage
- Purpose: Perform analytics on large datasets for insights and reporting.
- Common Use Cases:
- E-commerce: Analyze transaction trends, product performance.
- Revenue analysis by geography.
- Solution: Data Warehouses or Hadoop for large-scale analytics.
- Key Consideration:
- Not for real-time transactional systems.
- Primarily used for offline reporting and insights.
1. Structured vs. Unstructured Data
- Structured Data: Relational databases are suitable when data can be modeled in tables with rows and columns (e.g., user profiles).
- Unstructured Data: Non-relational databases are suitable for data that doesn’t fit into a tabular structure (e.g., catalog systems with varying attributes).
2. Relational Database (RDBMS) Scenarios
- ACID Guarantees: Required for applications like:
- Payment systems: Ensure atomicity and consistency in transactions (e.g., money transfer).
- Inventory management: Prevent overselling of products.
- Common Choices: MySQL, PostgreSQL, Oracle, SQL Server.
- No ACID Requirements: Can still use relational or non-relational databases (e.g., storing simple user information).
3. Document Database (NoSQL) Scenarios
- Purpose: Handles wide variety of attributes and queries efficiently.
- Example Use Cases:
- E-commerce catalogs: Products with varying attributes (e.g., size, color, volume).
- Random attribute queries: Optimized querying for JSON-like data structures.
- Common Choices: MongoDB, Couchbase.
4. Columnar Database Scenarios
- Purpose: Manages ever-increasing data with few query types but high volume.
- Example Use Case: Uber driver location pings (large data growth with fixed query patterns).
- Common Choices: Cassandra, HBase.
5. Combination of Databases
- Real-World Scenarios: Often require multiple databases for specific needs:
- Inventory Management: Use RDBMS for active inventory and transition completed orders to Cassandra for permanent storage.
- Analytics and Reporting: Use Document DB (e.g., MongoDB) for queryable data and combine it with RDBMS and Columnar DB for specific use cases.
- Example: E-commerce system:
- Use RDBMS for transactional data.
- Use Cassandra for ever-growing historical orders.
- Use MongoDB for complex querying (e.g., products bought in the last 5 days).
6. Low-Scale Systems
- If the system has a small dataset, few attributes, and limited queries, any database can be used.
- Such cases are rare in system design interviews but could appear in simpler applications.
Conclusion : Key Takeaways
- Understand the structure of data (structured vs. unstructured).
- Identify if ACID guarantees are required for transactions.
- Choose databases based on the type and volume of queries.
- Real-world systems often require a combination of databases to meet functional and non-functional requirements.