A dumb tool for finding superflous indexes in Rails

Posted on Apr 8, 2021

I have been reading a bunch around database indexes recently. MySQL (and maybe other databases, but I am focussing on MySQL) has a neat feature with multi-column indexes. To borrow heavily from their docs:

MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table. … Suppose that a table has the following specification:

1
2
3
4
5
6
7
CREATE TABLE test (
    id         INT NOT NULL,
    last_name  CHAR(30) NOT NULL,
    first_name CHAR(30) NOT NULL,
    PRIMARY KEY (id),
    INDEX name (last_name,first_name)
);

The name index is an index over the last_name and first_name columns. The index can be used for lookups in queries that specify values in a known range for combinations of last_name and first_name values. It can also be used for queries that specify just a last_name value because that column is a leftmost prefix of the index (as described later in this section). Therefore, the name index is used for lookups in the following queries:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
SELECT * FROM test WHERE last_name='Jones';

SELECT * FROM test
  WHERE last_name='Jones' AND first_name='John';

SELECT * FROM test
  WHERE last_name='Jones'
  AND (first_name='John' OR first_name='Jon');

SELECT * FROM test
  WHERE last_name='Jones'
  AND first_name >='M' AND first_name < 'N';

Nice.

My daily work is in Rails with MySQL so I thought I would write a quick script to see if we had excess indexes that could be covered by other multi-column indexes. The script leaves a lot to be desired, but can yield some interesting places to start looking.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# A dumb script to scan your Rails schema file
# and detect indexes which may be unnecessary.
# Made with reckless abandon by Tom Gamon
# http://tomgamon.com

# ruby index_scan.rb db/schema/rb

schema = ARGF.read

creates_table = /create_table.*?end/ms
table_name = /create_table "(.*?)".*?end/ms
index_name = /t\.index (.*?), n/
column = /\w+/

# Loop over the table definitions
schema.scan(creates_table).each do |table|
  # Pull out the columns used in the indexes
  indexes = table.scan(index_name).map do |index|
    index.first.scan(column).join("_")
  end
  
  # Sort so we deal with the shortest indexes first
  indexes.sort_by!(&:length)
  
  covered_indexes = indexes.map do |index|
    # Check to see if our index is covered by another index
    covering_index = indexes.difference([index])&.find{|x| x.start_with?(index)}
    if covering_index
      [index, covering_index]
    end
  end

  covered_indexes.compact!

  unless covered_indexes.empty?
    puts
    puts table.scan(table_name)
    covered_indexes.each do |index, covering_index|
      puts "Index on #{index} covered by #{covering_index}"
    end
  end
end

For the following schema file:

1
2
3
4
5
6
7
8
9
ActiveRecord::Schema.define(version: 2021_04_06_230529) do
  create_table "users", id: :integer, force: :cascade do |t|
    t.string "first_name", null: false
    t.string "last_name", null: false

    t.index ["last_name"], name: "last_name_idx"
    t.index ["last_name", "first_name"], name: "last_name_first_name_idx"
  end
end

We can grab it from the relevant gist and this would be the output.

1
2
3
4
5
$ curl https://gist.githubusercontent.com/thrgamon/6de17bc8bc2eb2cf912ea6fe0207de5f/raw/index_scan.rb > scan.rb
$ ruby scan.rb db/schema.rb

users
Index on last_name covered by last_name_first_name

We can then dig in and see if we can trim some of the fat. Check it out and let me know if you find it useful! Who knows, maybe some day I can turn it into something a little more polished.