Monday, July 8, 2013

How and When To Use Database Indexing

Indexing is crucial when it comes to Database Design. You should have a clear perception on plus and minus points of indexing.

What is Indexing 
Indexing is some strategy used to increase the performance of data retrieval operations of a database. Yes, indexing can increase only the data retrieval performance.

Lets think that we have a table named 'user' which contains thousands or millions of records. Assume that it contains a column named 'user_name'.

As you already know simple query like below will take lot of time.
SELECT * FROM user WHERE user_name = 'john';

Why this simple query takes such a long time. Answer is simple. This query search for all the records with user_name equal to 'john'.
Whether or not multiple records are available, a full table scan is required to check for all the matching records.
So each and every record is examined to check whether user_name is 'john'.

How indexing helps?
First of all note that we use indexing for columns (Ex:- user_name).
Second, for now(for learning purpose) assume index as a separate column. So think that when you create an index for user_name column, a duplicate column

of user_name column is generated with the name you give. The data in this column are sorted. You create indexes using a query like this.
CREATE INDEX user_name_index ON user (user_name);

In above query 'user_name_index' is the name of the index. 'user_name' is the name of the column.

You can create an index using multiple columns as below.
CREATE INDEX user_name_index ON user (first_name, last_name);

After you created the index for user_name column, if you execute the previous SELECT query again you will see that it doesn't take time as much as before.

How does this happen?
If you have indexed the column when you executing the query the database doesn't query from your 'user' table, instead it queries from the index.
Querying from the index is faster than querying from the entire table. Index is sorted(Not all. It depend of the data structure used. We will discuss it later), and it is easy to find all the records with a certain 'user_name' because all of the matching records are placed next to each other(because sorted).
Note that it is your task to create indexes, but deciding whether to use the index is a task of database itself. According to the query database decides whether to use the index or column name.

Is index a separate column in the table?
No, really it is a separate data structure which contains the column value(here 'user_name' column value) as the key and a pointer to the memory address of
the relevant table row. So after a certain index is queried the relevant data row can be quickly retrieved.

Disadvantages of Using Indexes
1. Index is a separate data structure and it contains the column data you selects. So it takes disk space to store them.

2. Every time you INSER, UPDATE or DELETE records Index also will be updated. This takes some time. So INSER, UPDATE and DELETE operations can be slower.

Data Structures
As you know now an index is a data structure. There are several data structures usually used for indexes such as B-tree, R-Tree, Hash, Bitmap etc.
However out of these, B-tree and Hash indexes are the widely used types.

  B-Tree
   This is the most commonly use data structure
   Stores data in an ordered manner. So retrieval of data is faster.
   can be used with LIKE and ORDER BY operations.

  Hash Index
   Hash table is used
   There is no way to determine whether one hash is greater than another. So if you want to
     retrieve data greater or less than a certain value, hashes can't help you.
     Similarly you can't select a range with hash indexes.
   equal and not-equal(= and <>) operations are very faster.
   can't use with LIKE and ORDER BY operations.

No comments:

Post a Comment