数据库设计 – 空间索引可以帮助“范围 – 按限制排序”查询
提出这个题目,出格是Postgres,由于它对R树/空间索引有很好的支持. 我们有下表,个中包括单词及其频率的树布局(嵌套集模子): lexikon ------- _id integer PRIMARY KEY word text frequency integer lset integer UNIQUE KEY rset integer UNIQUE KEY 和查询: SELECT word FROM lexikon WHERE lset BETWEEN @Low AND @High ORDER BY frequency DESC LIMIT @N 我以为(lset,frequency,word)上的包围索引是有效的,可是假如(@High,@ Low)范畴内的lset值太多,我认为它也许示意不佳. 当行使该索引的搜刮早期发生与范畴前提匹配的@N行时,(频率DESC)上的简朴索引偶然也大噶?鲢够的. 但好像机能很洪流平上取决于参数值. 有没有一种要领可以让它快速执行,无论范畴(@ Low,@ High)是宽照旧窄,无论顶级频率字是否荣幸地在(窄)选定范畴内? R树/空间索引会有辅佐吗? 添加索引,重写查询,从头计划表,没有限定. 办理要领通过先搜刮频率较高的行,您可以得到更好的机能.这可以通过“粒化”频率然后在措施上慢慢执行来实现,譬喻如下:–testbed和lexikon假造数据: begin; set role dba; create role stack; grant stack to dba; create schema authorization stack; set role stack; -- create table lexikon( _id serial,word text,frequency integer,lset integer,width_granule integer); -- insert into lexikon(word,lset) select word,(1000000/row_number() over(order by random()))::integer as frequency,lset from (select 'word'||generate_series(1,1000000) word,generate_series(1,1000000) lset) z; -- update lexikon set width_granule=ln(frequency)::integer; -- create index on lexikon(width_granule,lset); create index on lexikon(lset); -- the second index is not used with the function but is added to make the timings 'fair' 颗粒说明(首要用于信息和调解): create table granule as select width_granule,count(*) as freq,min(frequency) as granule_start,max(frequency) as granule_end from lexikon group by width_granule; -- select * from granule order by 1; /* width_granule | freq | granule_start | granule_end ---------------+--------+---------------+------------- 0 | 500000 | 1 | 1 1 | 300000 | 2 | 4 2 | 123077 | 5 | 12 3 | 47512 | 13 | 33 4 | 18422 | 34 | 90 5 | 6908 | 91 | 244 6 | 2580 | 245 | 665 7 | 949 | 666 | 1808 8 | 349 | 1811 | 4901 9 | 129 | 4926 | 13333 10 | 47 | 13513 | 35714 11 | 17 | 37037 | 90909 12 | 7 | 100000 | 250000 13 | 2 | 333333 | 500000 14 | 1 | 1000000 | 1000000 */ alter table granule drop column freq; -- 起首扫描高频的成果: create function f(p_lset_low in integer,p_lset_high in integer,p_limit in integer) returns setof lexikon language plpgsql set search_path to 'stack' as $$ declare m integer; n integer := 0; r record; begin for r in (select width_granule from granule order by width_granule desc) loop return query( select * from lexikon where width_granule=r.width_granule and lset>=p_lset_low and lset<=p_lset_high ); get diagnostics m = row_count; n = n+m; exit when n>=p_limit; end loop; end;$$; 功效(时刻应该用一小撮盐来举办,但每次查询运行两次以反抗任何缓存) 起首行使我们编写的函数: timing on -- select * from f(20000,30000,5) order by frequency desc limit 5; /* _id | word | frequency | lset | width_granule -----+-----------+-----------+-------+--------------- 141 | word23237 | 7092 | 23237 | 9 246 | word25112 | 4065 | 25112 | 8 275 | word23825 | 3636 | 23825 | 8 409 | word28660 | 2444 | 28660 | 8 418 | word29923 | 2392 | 29923 | 8 Time: 80.452 ms */ select * from f(20000,5) order by frequency desc limit 5; /* _id | word | frequency | lset | width_granule -----+-----------+-----------+-------+--------------- 141 | word23237 | 7092 | 23237 | 9 246 | word25112 | 4065 | 25112 | 8 275 | word23825 | 3636 | 23825 | 8 409 | word28660 | 2444 | 28660 | 8 418 | word29923 | 2392 | 29923 | 8 Time: 0.510 ms */ 然后行使简朴的索引扫描: select * from lexikon where lset between 20000 and 30000 order by frequency desc limit 5; /* _id | word | frequency | lset | width_granule -----+-----------+-----------+-------+--------------- 141 | word23237 | 7092 | 23237 | 9 246 | word25112 | 4065 | 25112 | 8 275 | word23825 | 3636 | 23825 | 8 409 | word28660 | 2444 | 28660 | 8 418 | word29923 | 2392 | 29923 | 8 Time: 218.897 ms */ select * from lexikon where lset between 20000 and 30000 order by frequency desc limit 5; /* _id | word | frequency | lset | width_granule -----+-----------+-----------+-------+--------------- 141 | word23237 | 7092 | 23237 | 9 246 | word25112 | 4065 | 25112 | 8 275 | word23825 | 3636 | 23825 | 8 409 | word28660 | 2444 | 28660 | 8 418 | word29923 | 2392 | 29923 | 8 Time: 51.250 ms */ timing off -- rollback; 按照您的现实数据,您也许但愿改变颗粒的数目和用于将行放入个中的函数.频率的现实漫衍在这里是要害,由于极限条款的预期值和所寻求的lset范畴的巨细. (编辑:湖南网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |