Recently I was called into a shop that had a 4 terabyte database, who claimed they didn’t have anywhere near that much data. Upon investigation, they were correct. They had about 100 gig of data, and about 3.9 T of indexes.
Is this too much? Well, in order to make that kind of judgement call there are several factors to consider:
- Are the indexes interfering with writes (i.e. is the number of indexes having a negative factor on insert / update performance)?
- Are the indexes actually being used?
- Are any indexes redundant?
The first one (write performance) can be identified by checking length of the write queues; almost any monitoring tool can get you this information.
The second can be identified with a script you can find on the web which counts the number of times an index has been accessed since the last time the server was booted. Of course, you want to the server to have been up long enough that the system tables are aware of weekend / month end processing. If you can’t find one readily, you can use the one I have on my blog:
The final one is the tool I used to reduce the database size for that customer from 4T to 300M without even doing “what’s being used” analysis… just by eliminating redundant indexes.
Many people find a script on the web (I have one in the blog mentioned above) which will look at the system tables and recommend missing indexes. The problem is, it will give you 6 similar indexes rather than one combined index. You have to do this yourself. If you automatically add all the indexes the server suggests… well, you can end up with 4T of indexes for 100G of data.
So how do we know what’s redundant?
Hopefully, you are old enough to remember a phone book. A phone book is an index on last name, first name, middle name. But what other indexes might that be used for? A lookup on just last name, first name could use the same phone book (and often does), as would a lookup just to find last name.
So if you have three indexes:
- Last name, first name, middle name
- Last name, first name
- Last name
You can visually look at these and KNOW that the last 2 are redundant because the first index can do the work that either of the second two could do.
As you start looking at data management view (DMV) information, you often get index recommendations along these lines:
- Create index idx on TableA (Column1)
- Create index idx on TableA (Column1) include (ColumnA, ColumnB)
- Create index idx on TableA (Column1, Column2) include (ColumnC, ColumnD)
Sometimes you’ll get 8 or 10 of these.
So, index 1 can easily be eliminated because index 2 covers it.
Index 2 can be combined into index 3 like this:
- Create index idx on TableA (Column1, Column2) include (ColumnA, ColumnB , ColumnC, ColumnD)
So… in summary, you can pull index recommendations from the DMVs, and they are good recommendations, but review them with an eye for detail before automatically applying them.