Sorted Hash Clusters have been around for several years, but I’ve not yet seen them being used, or even investigated in detail. This is a bit of a shame, really, because they seem to be engineered to address a couple of interesting performance patterns.
The basic concept is that data items that look alike are stored together (clustered) by applying a hashing function to generate a block address; but on top of that, if you query the data by “hashkey”, the results are returned in sorted order of a pre-defined “sortkey” without any need for sorting. (On top of everything else, the manuals describing what happens and how it works are wrong).
Yesterday I had reason to take a closer look at them, and decided that perhaps the reason no one talks about them is that they simply aren’t safe. Here’s a trivial demonstration, which I’ve run on 10.2.0.5, 11.2.0.3, and 12.1.0.1:
execute dbms_random.seed(0)
create cluster sorted_hash_cluster (
hash_value number(6,0),
sort_value varchar2(2) sort
)
size 300
hashkeys 100
;
create table sorted_hash_table (
hash_value number(6,0),
sort_value varchar2(2),
v1 varchar2(10),
padding varchar2(30)
)
cluster sorted_hash_cluster (
hash_value, sort_value
)
;
begin
for i in 1..5000 loop
insert into sorted_hash_table values(
trunc(dbms_random.value(0,99)),
dbms_random.string('U',2),
lpad(i,10),
rpad('x',30,'x')
);
commit;
end loop;
end;
/
begin
dbms_stats.gather_table_stats(
ownname => user,
tabname =>'sorted_hash_table'
);
end;
/
select count(*) from sorted_hash_table where hash_value = 92;
select count(*) from sorted_hash_table where hash_value = 92 and sort_value is null;
select count(*) from sorted_hash_table where hash_value = 92 and sort_value is not null;
select * from sorted_hash_table where hash_value = 92 and sort_value >= 'YR';
select * from sorted_hash_table where hash_value = 92 and sort_value > 'YR';
I think the nature of the last two queries is exactly the type for which the feature has been invented – just check the results, which come from a cut-n-paste after setting echo on:
SQL> select count(*) from sorted_hash_table where hash_value = 92;
COUNT(*)
----------
60
1 row selected.
SQL> select count(*) from sorted_hash_table where hash_value = 92 and sort_value is null;
COUNT(*)
----------
60
1 row selected.
SQL> select count(*) from sorted_hash_table where hash_value = 92 and sort_value is not null;
COUNT(*)
----------
60
1 row selected.
SQL> select * from sorted_hash_table where hash_value = 92 and sort_value >= 'YR';
HASH_VALUE SO V1 PADDING
---------- -- ---------- ------------------------------
92 YR 4773 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
92 ZF 250 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
92 ZJ 2046 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
92 ZT 65 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
4 rows selected.
SQL>
SQL> select * from sorted_hash_table where hash_value = 92 and sort_value > 'YR';
no rows selected
So: Null is not null, and ‘ZF’ is not greater than ‘YR’, it’s only greater than or equal to ‘YR’ !
I’d be interested to see the test cases that the developer used for this feature that allowed it to ship at all.