Why Choose Databases Over File Storage?

Yes, it's possible to create a software application without using databases by storing data in files.

SS However, this approach has significant limitations:

These limitations make databases a better choice for applications that require reliable, scalable, and efficient data management.

There are two basic types of databases:

Why Databases Are Essential for Businesses?

  1. Handling Large Data: Databases manage large volumes of data effectively, which is challenging with other tools.
  2. Accurate Data Retrieval: Databases use constraints to ensure you get accurate and consistent data whenever needed.
  3. Easy Updates: Updating data is straightforward with databases using Data Manipulation Language (DML).
  4. Security: Databases protect data by allowing access only to authorized users.
  5. Data Integrity: Databases maintain data accuracy and consistency through various constraints.
  6. Availability: Databases can be replicated across servers, ensuring data is always available and up-to-date.
  7. Scalability: Databases can be partitioned to manage large amounts of data across multiple nodes, enhancing scalability.


What is a Relational Database?

A relational database is a type of database that organizes data into tables (relations) which can be linked—or related—based on common data attributes. Each table consists of rows (records) and columns (attributes), and relationships between tables are established using keys.


Why Relational Databases?

ACID Properties

Relational databases ensure data reliability through the ACID properties:


Examples
  • MySQL: A popular open-source relational database used for web applications.
  • PostgreSQL: An open-source database known for its advanced features and standards compliance.
  • Oracle Database: A commercial database offering robust features for enterprise environments.
Drawbacks

SS

What is a Non-Relational Database?

A non-relational database, often referred to as NoSQL (Not Only SQL), is a type of database that stores and manages data in formats other than the traditional tabular structure used in relational databases. Non-relational databases are designed to handle large volumes of unstructured, semi-structured, or rapidly changing data, providing more flexibility and scalability compared to traditional relational databases.


Key Characteristics of Non-Relational Databases:
Examples of Non-Relational Databases:
SS
Advantages of Non-Relational Databases:

Drawbacks of Non-Relational Databases:

Clustered and Non-Clustered Indexes

Indexes in relational databases improve the speed of data retrieval. The two most common types of indexes are Clustered Indexes and Non-Clustered Indexes. Let's explore the differences between them with examples.

1. Clustered Index in Relational Database

A Clustered Index sorts and stores the data rows of the table based on the indexed column(s). Since the data is physically sorted in the order of the index, each table can only have one clustered index.


                CREATE TABLE Employees (
                EmployeeID INT PRIMARY KEY,
                Name VARCHAR(100),
                Department VARCHAR(50)
                );
            
    
                -- Creating a Clustered Index on the EmployeeID column (usually created by default for PRIMARY KEY)
                CREATE CLUSTERED INDEX idx_EmployeeID ON Employees (EmployeeID);
            

In this case, the table's data rows will be sorted by the EmployeeID, and the index itself holds the data. The data is physically reordered to match the index.


2. Clustered Index in Non-Relational Database

A clustered index determines the physical order of the data in the database. In a clustered index, the data is stored directly within the index structure. This type of index ensures that records are physically ordered based on the indexed field, which improves the performance of range queries.

In MongoDB, the _id field is automatically indexed as a clustered index. This ensures that documents are stored in the order of their _id values. When querying based on _id, MongoDB can quickly locate and retrieve the relevant document due to the sorted nature of the index.

3. Non-Clustered Index in Relational Database

A Non-Clustered Index does not affect the physical ordering of the data in the table. Instead, it creates a separate structure that holds the index and includes a pointer (reference) to the data rows. You can create multiple non-clustered indexes on a table.


                -- Creating a Non-Clustered Index on the Name column
                CREATE NONCLUSTERED INDEX idx_Name ON Employees (Name);
            

4. Non-Clustered Index in Relational Database

A non-clustered index is separate from the actual data storage. It contains a sorted list of key values along with pointers (references) to the actual data. The data itself is not stored in any specific order, but the index helps in quickly locating the data associated with the indexed field.

In MongoDB, you can create a non-clustered index on any field. For instance, if you create an index on the username field in a collection, the index stores sorted username values and pointers to the respective documents. This improves search performance on the username field without affecting the document's physical storage order.

In this example, the Name column is indexed in a separate structure. The actual data is still stored in the order of the clustered index (or the order in which it was inserted), but this index helps with faster lookups based on the Name.


Comparison of Clustered vs. Non-Clustered Index
Feature Clustered Index Non-Clustered Index
Data Storage Physically reorders the table's rows to match the index. Stores a separate index structure with a pointer to the actual data rows.
Number of Indexes per Table Only one per table (since the data can only be ordered one way). Multiple non-clustered indexes can exist on a table.
Performance Faster for range queries (e.g., searching for a range of IDs). Useful for quick lookups based on specific columns.
Primary Usage Often used for primary keys. Used for frequently queried columns that aren't the primary key.