因为HBase 数据储存按照 row key 排序,如果HBase表的 row key 是单调递增的,则HBase 容易有RegionServer 的局部热点问题。加盐可以缓解这个问题。
create table H3 (id varchar not null primary key, cf1.a varchar, cf2.b varchar) SALT_BUCKETS=20;
只能在创建表格时候加,创建后不可更改。alter table h1 set salt_buckets=10;
Error: ERROR 1024 (42Y83): Salt bucket number may only be specified when creating a table. tableName=H1
加盐后的注意事项:
a、sequential scan 返回的结果可能不是自然排序的,如果sequential scan使用了LIMIT语句,将与不加盐的情况不一样。
b、 Spit point:If no split points are specified for the table, the salted table would be pre-split on salt bytes boundaries to ensure load distribution among region servers even during the initial phase of the table. If users are to provide split points manually, users need to include a salt byte in the split points they provide.
c、Row Key 排序:Pre-spliting also ensures that all entries in the region server all starts with the same salt byte, and therefore are stored in a sorted manner. When doing a parallel scan across all region servers, we can take advantage of this properties to perform a merge sort of the client side. The resulting scan would still be return sequentially as if it is from a normal table
实际上是改写了Row Key,添加了一个prefix
new_row_key = (++index % BUCKETS_NUMBER) + original_key
数据存储到 Buckects_Number 个Bucket中 ,每个Bucket的Prefix 相同,在query的时候,同时在各个Bucket进行。