Database


  1. Why Choose Databases Over File Storage?
  2. Why Databases Are Essential for Businesses?
  3. What is a Relational Database?
  4. What is a Non-Relational Database?
  5. What is Data Replication?
  6. What is Data Partition/Sharding
  7. Clustered and Non-Clustered Indexes
  8. Comparison of Clustered vs. Non-Clustered Index

Why Choose Databases Over File Storage?

Yes, it's possible to create a software application without using databases by storing data in files.
However, this approach has significant limitations:

These limitations make databases a better choice for applications that require reliable, scalable, and efficient data management.

There are two basic types of databases:

Why Databases Are Essential for Businesses?
  1. Handling Large Data: Databases manage large volumes of data effectively, which is challenging with other tools.
  2. Accurate Data Retrieval: Databases use constraints to ensure you get accurate and consistent data whenever needed.
  3. Easy Updates: Updating data is straightforward with databases using Data Manipulation Language (DML).
  4. Security: Databases protect data by allowing access only to authorized users.
  5. Data Integrity: Databases maintain data accuracy and consistency through various constraints.
  6. Availability: Databases can be replicated across servers, ensuring data is always available and up-to-date.
  7. Scalability: Databases can be partitioned to manage large amounts of data across multiple nodes, enhancing scalability.

What is a Relational Database?

A relational database is a type of database that organizes data into tables (relations) which can be linked—or related—based on common data attributes. Each table consists of rows (records) and columns (attributes), and relationships between tables are established using keys.


Why Relational Databases?
ACID Properties

Relational databases ensure data reliability through the ACID properties:


Examples
Drawbacks

What is a Non-Relational Database?

A non-relational database, often referred to as NoSQL (Not Only SQL), is a type of database that stores and manages data in formats other than the traditional tabular structure used in relational databases. Non-relational databases are designed to handle large volumes of unstructured, semi-structured, or rapidly changing data, providing more flexibility and scalability compared to traditional relational databases.


Key Characteristics of Non-Relational Databases:
Examples of Non-Relational Databases:

Advantages of Non-Relational Databases:
Drawbacks of Non-Relational Databases:

What is Data Replication?
Replication:
Synchronous vs Asynchronous Replication
Data Replication Models Describes how data is copied and maintained in the system. Common models include:
Single Leader (Primary-Secondary) Replication
  1. Primary-Secondary Replication
  2. Replication Methods

Multi-Leader Replication
Leaderless (Peer-to-Peer) Replication

What is Data Partition/Sharding
What is Database Sharding
Vertical Sharding
Horizontal Sharding
Key-Range Based Sharding
Hash-Based Sharding
Consistent Hashing
Rebalance the Partitions
Partitioning and Secondary Indexes
Request Routing

Clustered and Non-Clustered Indexes

Indexes in relational databases improve the speed of data retrieval. The two most common types of indexes are Clustered Indexes and Non-Clustered Indexes. Let's explore the differences between them with examples.

1. Clustered Index in Relational Database
2. Clustered Index in Non-Relational Database
3. Non-Clustered Index in Relational Database
4. Non-Clustered Index in Non-Relational Database

Comparison of Clustered vs. Non-Clustered Index
Feature Clustered Index Non-Clustered Index
Data Storage Physically reorders the table's rows to match the index. Stores a separate index structure with a pointer to the actual data rows.
Number of Indexes per Table Only one per table (since the data can only be ordered one way). Multiple non-clustered indexes can exist on a table.
Performance Faster for range queries (e.g., searching for a range of IDs). Useful for quick lookups based on specific columns.
Primary Usage Often used for primary keys. Used for frequently queried columns that aren't the primary key.