加入收藏 | 设为首页 | 会员中心 | 我要投稿 湖南网 (https://www.hunanwang.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程 > 正文

数据库设计 – 空间索引可以帮助“范围 – 按限制排序”查询

发布时间:2021-01-14 06:38:57 所属栏目:编程 来源:网络整理
导读:提出这个题目,出格是Postgres,由于它对R树/空间索引有很好的支持. 我们有下表,个中包括单词及其频率的树布局(嵌套集模子): lexikon-------_id integer PRIMARY KEYword textfrequency integerlset integer UNIQUE KEYrset integer UNIQUE KEY 和查询: SELEC

提出这个题目,出格是Postgres,由于它对R树/空间索引有很好的支持.

我们有下表,个中包括单词及其频率的树布局(嵌套集模子):

lexikon
-------
_id   integer  PRIMARY KEY
word  text
frequency integer
lset  integer  UNIQUE KEY
rset  integer  UNIQUE KEY

和查询:

SELECT word
FROM lexikon
WHERE lset BETWEEN @Low AND @High
ORDER BY frequency DESC
LIMIT @N

我以为(lset,frequency,word)上的包围索引是有效的,可是假如(@High,@ Low)范畴内的lset值太多,我认为它也许示意不佳.

当行使该索引的搜刮早期发生与范畴前提匹配的@N行时,(频率DESC)上的简朴索引偶然也大噶?鲢够的.

但好像机能很洪流平上取决于参数值.

有没有一种要领可以让它快速执行,无论范畴(@ Low,@ High)是宽照旧窄,无论顶级频率字是否荣幸地在(窄)选定范畴内?

R树/空间索引会有辅佐吗?

添加索引,重写查询,从头计划表,没有限定.

办理要领

通过先搜刮频率较高的行,您可以得到更好的机能.这可以通过“粒化”频率然后在措施上慢慢执行来实现,譬喻如下:

–testbed和lexikon假造数据:

begin;
set role dba;
create role stack;
grant stack to dba;
create schema authorization stack;
set role stack;
--
create table lexikon( _id serial,word text,frequency integer,lset integer,width_granule integer);
--
insert into lexikon(word,lset) 
select word,(1000000/row_number() over(order by random()))::integer as frequency,lset
from (select 'word'||generate_series(1,1000000) word,generate_series(1,1000000) lset) z;
--
update lexikon set width_granule=ln(frequency)::integer;
--
create index on lexikon(width_granule,lset);
create index on lexikon(lset);
-- the second index is not used with the function but is added to make the timings 'fair'

颗粒说明(首要用于信息和调解):

create table granule as 
select width_granule,count(*) as freq,min(frequency) as granule_start,max(frequency) as granule_end 
from lexikon group by width_granule;
--
select * from granule order by 1;
/*
 width_granule |  freq  | granule_start | granule_end
---------------+--------+---------------+-------------
             0 | 500000 |             1 |           1
             1 | 300000 |             2 |           4
             2 | 123077 |             5 |          12
             3 |  47512 |            13 |          33
             4 |  18422 |            34 |          90
             5 |   6908 |            91 |         244
             6 |   2580 |           245 |         665
             7 |    949 |           666 |        1808
             8 |    349 |          1811 |        4901
             9 |    129 |          4926 |       13333
            10 |     47 |         13513 |       35714
            11 |     17 |         37037 |       90909
            12 |      7 |        100000 |      250000
            13 |      2 |        333333 |      500000
            14 |      1 |       1000000 |     1000000
*/
alter table granule drop column freq;
--

起首扫描高频的成果:

create function f(p_lset_low in integer,p_lset_high in integer,p_limit in integer)
       returns setof lexikon language plpgsql set search_path to 'stack' as $$
declare
  m integer;
  n integer := 0;
  r record;
begin 
  for r in (select width_granule from granule order by width_granule desc) loop
    return query( select * 
                  from lexikon 
                  where width_granule=r.width_granule 
                        and lset>=p_lset_low and lset<=p_lset_high );
    get diagnostics m = row_count;
    n = n+m;
    exit when n>=p_limit;
  end loop;
end;$$;

功效(时刻应该用一小撮盐来举办,但每次查询运行两次以反抗任何缓存)

起首行使我们编写的函数:

timing on
--
select * from f(20000,30000,5) order by frequency desc limit 5;
/*
 _id |   word    | frequency | lset  | width_granule
-----+-----------+-----------+-------+---------------
 141 | word23237 |      7092 | 23237 |             9
 246 | word25112 |      4065 | 25112 |             8
 275 | word23825 |      3636 | 23825 |             8
 409 | word28660 |      2444 | 28660 |             8
 418 | word29923 |      2392 | 29923 |             8
Time: 80.452 ms
*/
select * from f(20000,5) order by frequency desc limit 5;
/*
 _id |   word    | frequency | lset  | width_granule
-----+-----------+-----------+-------+---------------
 141 | word23237 |      7092 | 23237 |             9
 246 | word25112 |      4065 | 25112 |             8
 275 | word23825 |      3636 | 23825 |             8
 409 | word28660 |      2444 | 28660 |             8
 418 | word29923 |      2392 | 29923 |             8
Time: 0.510 ms
*/

然后行使简朴的索引扫描:

select * from lexikon where lset between 20000 and 30000 order by frequency desc limit 5;
/*
 _id |   word    | frequency | lset  | width_granule
-----+-----------+-----------+-------+---------------
 141 | word23237 |      7092 | 23237 |             9
 246 | word25112 |      4065 | 25112 |             8
 275 | word23825 |      3636 | 23825 |             8
 409 | word28660 |      2444 | 28660 |             8
 418 | word29923 |      2392 | 29923 |             8
Time: 218.897 ms
*/
select * from lexikon where lset between 20000 and 30000 order by frequency desc limit 5;
/*
 _id |   word    | frequency | lset  | width_granule
-----+-----------+-----------+-------+---------------
 141 | word23237 |      7092 | 23237 |             9
 246 | word25112 |      4065 | 25112 |             8
 275 | word23825 |      3636 | 23825 |             8
 409 | word28660 |      2444 | 28660 |             8
 418 | word29923 |      2392 | 29923 |             8
Time: 51.250 ms
*/
timing off
--
rollback;

按照您的现实数据,您也许但愿改变颗粒的数目和用于将行放入个中的函数.频率的现实漫衍在这里是要害,由于极限条款的预期值和所寻求的lset范畴的巨细.

(编辑:湖南网)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    热点阅读