Advantages and Disadvantages of Indexing in SQL
Overview
An index is a schema object that contains keys from various columns of our data table. B-Trees are used to store the index keys. It helps us find the rows or the values to be searched faster. They have various advantages like increased performance in searching for records, sorting records, grouping records, or maintaining a unique column. Some of the disadvantages include increased disk space, slower data modification, and updating records in the clustered index.
Advantages of Indexes in SQL
Let's consider the following table named Employee to discuss the advantages of Index in SQL.
EmpId | Last_Name | First_Name | Title | Salary | Location |
---|---|---|---|---|---|
1 | Bharti | Nisha | Technical Staff | 80000 | Bangalore |
2 | Jha | Durgesh | Data Analyst | 55000 | Hyderabad |
3 | Sharma | Esha | Consultant | 70000 | Mumbai |
4 | Ahmed | Sarfaraz | Associate Consultant | 65000 | Chennai |
5 | Bharti | Amish | Software Developer | 82000 | Gurgaon |
6 | Patra | Suman | Business Analyst | 90000 | Pune |
Searching For Records
Indexes in SQL help us find a record or a list of records by matching the conditions of the WHERE clause. It can help queries to search for a specific value or inside a range of values. It makes searching faster, which ultimately leads to the enhancement of the performance of the query. Statements like SELECT, UPDATE and DELETE take the most benefit of indexes to increase search execution. Let's have a look at each of them through examples.
Let's start with the SELECT statement.
In the above example, we can see the SQL SELECT statement benefitting from the index in the Salary column. The index sorts the salaries in ascending order. The SELECT statement uses these indexes to identify and separate the records with a salary between 65000 and 81000.
The following query uses the DELETE statement in SQL.
The above DELETE statement deletes the records of the employees with a salary equal to 70000. Indexes help the query in finding the row faster, improving the search performance of the query.
The same logic repeats for the UPDATE statement.
The above SQL query updates the records of the employees whose current salary is 55000. The updated value of salary is set to 80000. The rows with a salary of 55000 are found using the index.
Sorting Records
We use indexes in sorting datasets. The database finds the index to avoid sorting during the query execution. The sorted order is specified by the keywords ASC and DESC for ascending and descending, respectively. The ORDER BY clause specifies single or multiple fields for limiting the sorting of the dataset. Let's have a closer look using an example.
The above SQL Query returns the details of the employees with their salaries sorted in ascending order.
In the absence of an index, the database scans the Employee table. It further processes the query by sorting the rows. However, the index on the Salary provides a presorted list of salaries to the database. It then scans the index from first to last to get the rows in sorted order. The same index works well for sorting the query in reverse order.
The above SQL query returns the details of the employees with their salaries sorted in descending order. It happens by browsing the index in reverse order.
Grouping Records
The index helps group records with indexed columns used with the GROUP BY clause. SQL queries with the GROUP BY clause sort the columns included in the clause to find the result. Let's have a better understanding of it using an example.
The above SQL query returns the count of the number of employees with each salary amount. It happens by grouping the employees according to their salary such that all employees having the same salary remain together.
The Index IDX_Salary is used to get the salaries in sorted order. Sorting according to salaries brings the employees with the same salary together. It helps the database count the number of employees of each salary faster as they appear consecutively in the resulting database. Thus, indexing speeds up the grouping of records by a GROUP BY clause.
Maintaining a Unique Column
We often work on primary keys in SQL. Unique indexes apply to primary key columns that require unique values. There are different ways to create a unique index. One of the ways is to mark a column as the primary key. A primary key column has a unique index on the column automatically created on it. Another way is by checking the CREATE UNIQUE checkbox that gives a unique index to the specified column.
Let's create a unique index on the EmpId column.
The above SQL Query gives a unique index to the EmpId column of the Employee table. It prevents any duplicate values in the EmpId column by using indexes. Indexes are the best tools for the database to create columns with unique values. During data manipulation by adding or modifying the rows, it is first searched and compared with all the existing records. It ensures that the new value stays unique and doesn't duplicate existing values. Thus, the index speeds up the performance and improves the search speed.
Disadvantages of Indexes in SQL
Every component of programming comes with its own set of pros and cons. Indexes are also not an exception. Despite improving the search performances in searching and sorting of records, it has a different set of drawbacks. Let's talk about each one of them in detail.
Additional Disk Space
The clustered index used in SQL doesn't take any extra space as it stores the physical order of the table records in the database. Since there is only one physical order of the table, there is only one clustered index.
But it is different for non-clustered indexes that need extra disk space. Non-clustered indexes are separate structures from the data rows in the table. It reorders one or more columns as the logical order is different from the physical order.
The physical order of data is the order in which it gets stored in the database.
Example: Let's consider the following table.
id | last_name | first_name | gallery_id | agency_fee |
---|---|---|---|---|
1 | Brown | Denis | 2 | 2250 |
2 | White | Kate | 3 | 3120 |
3 | Black | Sarah | 2 | 1640 |
4 | Smith | Helen | 1 | 4500 |
5 | Stewart | Tom | 3 | 2130 |
In the above table, the physical order of the last name will be Brown, White, Black, Smith, and Stewart. In the logical order, rows are ordered according to the index. Let's order the data according to the index of the last name. The logical order of the last name looks like this - Black, Brown, Smith, Stewart, White.
The space depends on various factors used in the index, like the size of the table, the number of columns, and the type of column. When processing a database with numerous users, the disk space is considered cheaper than the application performance. The following code is used to view the space required by a table.
The above code uses the sp_spaceused with the table name that returns the space occupied by the table data and the indexes associated with the table. It displays the number of rows, data size, index size, reserved disk space, and the amount of used and unused disk space used by the table.
The result is of the form:
name | rows | reserved | data | index_size | unused |
---|---|---|---|---|---|
Employee | 112653 | 3022664 KB | 1646616 KB | 1375640 KB | 408 KB |
The resulting data varies on the columns, the number of indexes, and the data types of the table data.
Slowing of Data Modification
Indexes have a poor response on the performance of the data modification statements like INSERT, UPDATE, or DELETE. Every time a query asks to modify the data in the table, the database updates itself with the new index where the data changes. As discussed earlier, indexes help us locate the records faster, leading to faster sorting and searching performances. Thus, having too many indexes may help us find the records quicker but poorly impact the speed of data modification. So, we need to have a good number of index that balances the system performance.
A Disadvantage to Clustered Indexes
Any update of record or change in the value of the indexed column in a clustered index might need the repositioning of the rows to maintain the sorted order. The UPDATE Query can also be viewed as a DELETE Query followed by an INSERT statement, which lowers the performance. The clustered index of a table is, by default, created on a primary or foreign key column. It is because clustered index stores the physical order of the table that can be only one and unique like a primary key.
Conclusion
- Using an index in SQL has many advantages like optimized search performance, faster sorting and grouping of records, and easier maintenance of unique columns.
- It also comes with different drawbacks and downsides, like extra use of disk space and poor performance in data modification while using INSERT, UPDATE, and DELETE statements.
- No extra disk space is required for the clustered index, but non clustered index takes extra space as it stores separately from the table.