sql-server – 为什么LEN()函数严重低估了SQL Server 2014中的基
我有一个带有字符串列的表和一个搜查具有必然长度的行的谓词.在SQL Server 2014中,无论我搜查的长度怎样,我城市看到1行的预计值.这发生了很是糟糕的打算,由于现实上稀有千乃至数百万行,SQL Server正在选择将此表放在嵌套轮回的外侧. 是否有SQL Server 2014的基数预计值为1.0003,而SQL Server 2012预计为31,622行?有一个很好的办理要领吗? 以下是该题目的简短复制: -- Create a table with 1MM rows of dummy data CREATE TABLE #customers (cust_nbr VARCHAR(10) NOT NULL) GO INSERT INTO #customers WITH (TABLOCK) (cust_nbr) SELECT TOP 1000000 CONVERT(VARCHAR(10),ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) AS cust_nbr FROM master..spt_values v1 CROSS JOIN master..spt_values v2 GO -- Looking for string of a certain length. -- While both CEs yield fairly poor estimates,the 2012 CE is much -- more conservative (higher estimate) and therefore much more likely -- to yield an okay plan rather than a drastically understimated loop join. -- 2012: 31,622 rows estimated,900K rows actual -- 2014: 1 row estimated,900K rows actual SELECT COUNT(*) FROM #customers WHERE LEN(cust_nbr) = 6 OPTION (QUERYTRACEON 9481) -- Optionally,use 2012 CE GO Here is a more complete script showing additional tests 我也读过whitepaper on the SQL Server 2014 Cardinality Estimator,但没有找到任何澄清环境的对象. 办理要领对付遗留CE,我看到预计是行的3.16228% – 这是用于column = literal谓词的“幻数”开导式(尚有其他基于谓词结构的开导式要领 – 但LEN包裹在列的周围遗留的CE功效与此揣摩框架相匹配).你可以在Joe Sack的 Selectivity Guesses in absence of Statistics和Ian Jose的 Constant-Constant Comparison Estimation的帖子上看到这个例子.-- Legacy CE: 31622.8 rows SELECT COUNT(*) FROM #customers WHERE LEN(cust_nbr) = 6 OPTION ( QUERYTRACEON 9481); -- Legacy CE GO 此刻,对付新的CE举动,它看起来此刻对优化器可见(这意味着我们可以行使统计信息).我查察了下面的计较器输出,你可以查察相干的自动天生统计数据作为指针: -- New CE: 1.00007 rows SELECT COUNT(*) FROM #customers WHERE LEN(cust_nbr) = 6 OPTION ( QUERYTRACEON 2312 ); -- New CE GO -- View New CE behavior with 2363 (for supported option use XEvents) SELECT COUNT(*) FROM #customers WHERE LEN(cust_nbr) = 6 OPTION (QUERYTRACEON 2312,QUERYTRACEON 2363,QUERYTRACEON 3604,RECOMPILE); -- New CE GO /* Loaded histogram for column QCOL: [tempdb].[dbo].[#customers].cust_nbr from stats with id 2 Using ambient cardinality 1e+006 to combine distinct counts: 999927 Combined distinct count: 999927 Selectivity: 1.00007e-006 Stats collection generated: CStCollFilter(ID=2,CARD=1.00007) CStCollBaseTable(ID=1,CARD=1e+006 TBL: #customers) End selectivity computation */ EXEC tempdb..sp_helpstats '#customers'; --Check out AVG_RANGE_ROWS values (for example - plenty of ~ 1) DBCC SHOW_STATISTICS('tempdb..#customers','_WA_Sys_00000001_B0368087'); --That's my Stats name yours is subject to change 遗憾的是,逻辑依靠于对差异值的数目的预计,而不是针对LEN函数的影响举办调解. 也许的办理要领 通过将LEN重写为LIKE,您可以在两种CE模子下得到基于trie的预计: SELECT COUNT_BIG(*) FROM #customers AS C WHERE C.cust_nbr LIKE REPLICATE('_',6); 有关行使跟踪符号的信息: > 2363:表现大量信息,包罗正在加载的统计信息.> 3604:将DBCC呼吁的输出打印到动静选项卡. (编辑:湖南网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |