PostgreSQL Index Only Scan 奮闘記その5

どうも。村上です。

今回がきっと最終回の5回目です。

では、最後となる「Index Only Scan 奮闘記」を。

Index Only Scan の explain

前回でもIndex Only Scanの実行計画をみました。
今回は前回より複雑なIndex Only Scanのexplainをみましょう。

テーブルはこんな感じです。

# \d users
                                        Table "public.users"
      Column      |            Type             |                     Modifiers
------------------+-----------------------------+----------------------------------------------------
 id               | integer                     | not null default nextval('users_id_seq'::regclass)
 login            | character varying(255)      | not null
 mail_address     | character varying(255)      | not null
 family_name      | character varying(255)      | not null
 first_name       | character varying(255)      | not null
 family_name_kana | character varying(255)      |
 first_name_kana  | character varying(255)      |
 birthday         | date                        |
 gender           | integer                     |
 post_number      | character varying(255)      |
 prefecture       | character varying(255)      |
 city             | character varying(255)      |
 address          | character varying(255)      |
 building         | character varying(255)      |
 telephone        | character varying(255)      |
 created_at       | timestamp without time zone |
 updated_at       | timestamp without time zone |
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "users_login_idx" btree (login)
    "users_name_idx" btree (family_name, first_name)

# \d users

Table "public.users"

Column | Type | Modifiers

------------------+-----------------------------+----------------------------------------------------

id | integer | not null default nextval('users_id_seq'::regclass)

mail_address | character varying(255) | not null

family_name | character varying(255) | not null

first_name | character varying(255) | not null

family_name_kana | character varying(255) |

first_name_kana | character varying(255) |

birthday | date |

gender | integer |

post_number | character varying(255) |

prefecture | character varying(255) |

city | character varying(255) |

address | character varying(255) |

building | character varying(255) |

telephone | character varying(255) |

created_at | timestamp without time zone |

updated_at | timestamp without time zone |

Indexes:

"users_pkey" PRIMARY KEY, btree (id)

"users_login_idx" btree (login)

"users_name_idx" btree (family_name, first_name)

Primary Keyでselect

=# explain select id from users;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Index Only Scan using users_pkey on users  (cost=0.00..259699.22 rows=10000000 width=4)

=# explain select id from users;

QUERY PLAN

-----------------------------------------------------------------------------------------

Index Only Scan using users_pkey on users (cost=0.00..259699.22 rows=10000000 width=4)

複合Index

=# explain select family_name, first_name from users;
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Index Only Scan using users_name_idx on users  (cost=0.00..307880.04 rows=10000000 width=12)

=# explain select family_name, first_name from users;

QUERY PLAN

----------------------------------------------------------------------------------------------

Index Only Scan using users_name_idx on users (cost=0.00..307880.04 rows=10000000 width=12)

複合Indexの片方

=# explain select first_name from users;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Index Only Scan using users_name_idx on users  (cost=0.00..307880.04 rows=10000000 width=6)

=# explain select first_name from users;

QUERY PLAN

---------------------------------------------------------------------------------------------

Index Only Scan using users_name_idx on users (cost=0.00..307880.04 rows=10000000 width=6)

GROUP BY

=# explain select login from users group by login;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Group  (cost=0.00..371911.94 rows=10000000 width=19)
   ->  Index Only Scan using users_login_idx on users  (cost=0.00..346911.94 rows=10000000 width=19)

=# explain select login from users group by login;

QUERY PLAN

-----------------------------------------------------------------------------------------------------

Group (cost=0.00..371911.94 rows=10000000 width=19)

-> Index Only Scan using users_login_idx on users (cost=0.00..346911.94 rows=10000000 width=19)

副問い合わせ

=# explain select family_name from users where id in (select id from users);
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Merge Semi Join  (cost=0.00..954471.44 rows=10000000 width=6)
   Merge Cond: (public.users.id = public.users.id)
   ->  Index Scan using users_pkey on users  (cost=0.00..544772.22 rows=10000000 width=10)
   ->  Index Only Scan using users_pkey on users  (cost=0.00..259699.22 rows=10000000 width=4)

=# explain select family_name from users where id in (select id from users);

QUERY PLAN

-----------------------------------------------------------------------------------------------

Merge Semi Join (cost=0.00..954471.44 rows=10000000 width=6)

Merge Cond: (public.users.id = public.users.id)

-> Index Scan using users_pkey on users (cost=0.00..544772.22 rows=10000000 width=10)

-> Index Only Scan using users_pkey on users (cost=0.00..259699.22 rows=10000000 width=4)

すべてについてIndex Only Scanが使用されましたね。

GOOD

Index Only Scan が効かない！？

usersテーブルの項目を減らしたsimple_usersを作成しました。
データ件数はusersと同じ1000万件です。

=# \d simple_users;
                                      Table "public.simple_users"
    Column    |            Type             |                         Modifiers
--------------+-----------------------------+-----------------------------------------------------------
 id           | integer                     | not null default nextval('simple_users_id_seq'::regclass)
 login        | character varying(255)      | not null
 mail_address | character varying(255)      | not null
 family_name  | character varying(255)      | not null
 first_name   | character varying(255)      | not null
 created_at   | timestamp without time zone |
 updated_at   | timestamp without time zone |
Indexes:
    "simple_users_pkey" PRIMARY KEY, btree (id)
    "simple_users_login_idx" btree (login)
    "simple_users_name_idx" btree (family_name, first_name)

=# \d simple_users;

Table "public.simple_users"

Column | Type | Modifiers

--------------+-----------------------------+-----------------------------------------------------------

id | integer | not null default nextval('simple_users_id_seq'::regclass)

mail_address | character varying(255) | not null

family_name | character varying(255) | not null

first_name | character varying(255) | not null

created_at | timestamp without time zone |

updated_at | timestamp without time zone |

Indexes:

"simple_users_pkey" PRIMARY KEY, btree (id)

"simple_users_login_idx" btree (login)

"simple_users_name_idx" btree (family_name, first_name)

このテーブルに対して、Explainしてみます。

=# explain select id from simple_users;
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on simple_users  (cost=0.00..246700.38 rows=9999838 width=4)

=# explain select login from simple_users;
                               QUERY PLAN
------------------------------------------------------------------------
 Seq Scan on simple_users  (cost=0.00..246700.38 rows=9999838 width=19)

=# explain select family_name, first_name from simple_users;
                               QUERY PLAN
------------------------------------------------------------------------
 Seq Scan on simple_users  (cost=0.00..246700.38 rows=9999838 width=12)

=# explain select id from simple_users;

QUERY PLAN

-----------------------------------------------------------------------

Seq Scan on simple_users (cost=0.00..246700.38 rows=9999838 width=4)

=# explain select login from simple_users;

QUERY PLAN

------------------------------------------------------------------------

Seq Scan on simple_users (cost=0.00..246700.38 rows=9999838 width=19)

=# explain select family_name, first_name from simple_users;

QUERY PLAN

------------------------------------------------------------------------

Seq Scan on simple_users (cost=0.00..246700.38 rows=9999838 width=12)

あれ？？？
すべてSeq Scanに！？

vacuum analyzeのし忘れではありません。

Seq ScanをoffにしてExplainしてみると

=# set enable_seqscan = off;
SET
=# explain select login from simple_users;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Index Only Scan using simple_users_login_idx on simple_users  (cost=0.00..346909.51 rows=9999838 width=19)

=# set enable_seqscan = off;

SET

=# explain select login from simple_users;

QUERY PLAN

------------------------------------------------------------------------------------------------------------

Index Only Scan using simple_users_login_idx on simple_users (cost=0.00..346909.51 rows=9999838 width=19)

Index Only Scanが使われるようです。

じゃあ、オプティマイザーがIndex Only Scan より Seq Scanの方が良いと判断したのかな？
コストはSeq Scanの方がよさそうなので、

explain analyzeで速度を見てみましょう。

=# explain analyze select login from simple_users;
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on simple_users  (cost=0.00..246700.38 rows=9999838 width=19) (actual time=6.561..5064.454 rows=10000000 loops=1)
 Total runtime: 5885.929 ms

=# set enable_seqscan = off;

=# explain analyze select login from simple_users;
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using simple_users_login_idx on simple_users  (cost=0.00..346909.51 rows=9999838 width=19) (actual time=0.065..4020.799 rows=10000000 loops=1)
   Heap Fetches: 0
 Total runtime: 4869.963 ms

=# explain analyze select login from simple_users;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------

Seq Scan on simple_users (cost=0.00..246700.38 rows=9999838 width=19) (actual time=6.561..5064.454 rows=10000000 loops=1)

Total runtime: 5885.929 ms

=# set enable_seqscan = off;

=# explain analyze select login from simple_users;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------

Index Only Scan using simple_users_login_idx on simple_users (cost=0.00..346909.51 rows=9999838 width=19) (actual time=0.065..4020.799 rows=10000000 loops=1)

Heap Fetches: 0

Total runtime: 4869.963 ms

確かにそれほど速度差はないかな？？

でもselect count(*)で見ると

=# explain analyze select count(*) from simple_users;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=271699.97..271699.98 rows=1 width=0) (actual time=5827.386..5827.386 rows=1 loops=1)
   ->  Seq Scan on simple_users  (cost=0.00..246700.38 rows=9999838 width=0) (actual time=2.346..4504.506 rows=10000000 loops=1)
 Total runtime: 5827.434 ms

=# set enable_seqscan = off;

=# explain analyze select count(*) from simple_users;
                                                         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=284692.38..284692.39 rows=1 width=0) (actual time=3110.858..3110.858 rows=1 loops=1)
   ->  Index Only Scan using simple_users_pkey on simple_users  (cost=0.00..259692.79 rows=9999838 width=0) (actual time=0.038..1876.960 rows=10000000 loops=1)
         Heap Fetches: 0
 Total runtime: 3110.943 ms

=# explain analyze select count(*) from simple_users;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=271699.97..271699.98 rows=1 width=0) (actual time=5827.386..5827.386 rows=1 loops=1)

-> Seq Scan on simple_users (cost=0.00..246700.38 rows=9999838 width=0) (actual time=2.346..4504.506 rows=10000000 loops=1)

Total runtime: 5827.434 ms

=# set enable_seqscan = off;

=# explain analyze select count(*) from simple_users;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=284692.38..284692.39 rows=1 width=0) (actual time=3110.858..3110.858 rows=1 loops=1)

-> Index Only Scan using simple_users_pkey on simple_users (cost=0.00..259692.79 rows=9999838 width=0) (actual time=0.038..1876.960 rows=10000000 loops=1)

Heap Fetches: 0

Total runtime: 3110.943 ms

そこそこ速度差はある。。。

ということは列のサイズによってコスト計算がことなり、Index Only Scanが使用されたり、されなかったりする？？？

ちなみに「Heap Fetches」はテーブルにアクセスした件数です。
今は0件なので、テーブルアクセスなしです。

=# update simple_users set mail_address = 'hoge_var@sample.ne.jp' where id in (select id from simple_users limit 100);
UPDATE 100

=# explain analyze select count(*) from simple_users;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=284704.76..284704.77 rows=1 width=0) (actual time=3049.392..3049.392 rows=1 loops=1)
   ->  Index Only Scan using simple_users_pkey on simple_users  (cost=0.00..259704.83 rows=9999974 width=0) (actual time=0.024..1841.731 rows=10000000 loops=1)
         Heap Fetches: 275
 Total runtime: 3049.433 ms

=# update simple_users set mail_address = '[email protected]' where id in (select id from simple_users limit 100);

UPDATE 100

=# explain analyze select count(*) from simple_users;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=284704.76..284704.77 rows=1 width=0) (actual time=3049.392..3049.392 rows=1 loops=1)

-> Index Only Scan using simple_users_pkey on simple_users (cost=0.00..259704.83 rows=9999974 width=0) (actual time=0.024..1841.731 rows=10000000 loops=1)

Heap Fetches: 275

Total runtime: 3049.433 ms

となります。

=# vacuum analyze simple_users ;
VACUUM
=# explain analyze select count(*) from simple_users;
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=284694.25..284694.26 rows=1 width=0) (actual time=3231.098..3231.098 rows=1 loops=1)
   ->  Index Only Scan using simple_users_pkey on simple_users  (cost=0.00..259694.96 rows=9999716 width=0) (actual time=0.026..1975.628 rows=10000000 loops=1)
         Heap Fetches: 0
 Total runtime: 3231.187 ms

=# vacuum analyze simple_users ;

VACUUM

=# explain analyze select count(*) from simple_users;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=284694.25..284694.26 rows=1 width=0) (actual time=3231.098..3231.098 rows=1 loops=1)

-> Index Only Scan using simple_users_pkey on simple_users (cost=0.00..259694.96 rows=9999716 width=0) (actual time=0.026..1975.628 rows=10000000 loops=1)

Heap Fetches: 0

Total runtime: 3231.187 ms

vacuum analyzeすると0件になります。

件数を減らすと。。。

不思議なことに件数を減らすとIndex Only Scanが使用されます。

=# explain select count(*) from simple_users where id < 5000000;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=155359.00..155359.01 rows=1 width=0)
   ->  Index Only Scan using simple_users_pkey on simple_users  (cost=0.00..142818.53 rows=5016189 width=0)
         Index Cond: (id < 5000000)

=# explain select count(*) from simple_users where id < 5000000;

QUERY PLAN

------------------------------------------------------------------------------------------------------------

Aggregate (cost=155359.00..155359.01 rows=1 width=0)

-> Index Only Scan using simple_users_pkey on simple_users (cost=0.00..142818.53 rows=5016189 width=0)

Index Cond: (id < 5000000)

でも件数を増やすと。。。

=# explain select count(*) from simple_users where id < 9800000;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Aggregate  (cost=296188.20..296188.21 rows=1 width=0)
   ->  Seq Scan on simple_users  (cost=0.00..271700.45 rows=9795098 width=0)
         Filter: (id < 9800000)

=# explain select count(*) from simple_users where id < 9800000;

QUERY PLAN

-----------------------------------------------------------------------------

Aggregate (cost=296188.20..296188.21 rows=1 width=0)

-> Seq Scan on simple_users (cost=0.00..271700.45 rows=9795098 width=0)

Filter: (id < 9800000)

Index Only Scanが効かない。。。
ってか、Index が効かない？？
件数が多い時ほど効いて欲しいんですが。。。

奮闘記は続く

ということで、Index Only Scanを使うには

対象テーブルにIndexを作成する
vacuum analyzeをする
Indexキーの列だけを参照するselectをする
それなりに列のがあるテーブルじゃないと効きません(?)

です。

最後は謎。。。
上記以外にものいろいろ調べましたが、結局謎。。。

おそらくオプティマイザーの仕様がそうなっているんでしょう。
次のバージョンアップはきっと解消されるんでしょう。

その時に、この奮闘記を再び始めます！！

最後になりましたが、この連載を読んでいただいてありがとうございます。
次も連載系のネタがあればやらせていただきますが、しばらくは単発系のネタをするつもりです。