Featured

Hashing

Hashing MIT Tutorial: https://www.youtube.com/watch?v=0M_kIqhwbFo&t=1912s

Hashing Table, key issue is how to handle collision. Different input keys will assign to the same bucket/slot. The specific slot can have a linked list to map different input keys. Linked List explanation: https://www.geeksforgeeks.org/linked-list-set-1-introduction/

Internet friend explanation of Hashing Chaining.

Not sure which part you’re asking about, but here’s an example.
Suppose you want to put three objects (in our case tuples) into a hash
table, where the first attribute is the key:

(‘AAA’, ‘Aardvark’),
(‘BBB’, ‘Bumblebee’),
(‘CCC’, ‘Cat’)

Suppose your hash table has 4 buckets.  We can use the lower 2 bits to
map hash values to bucket numbers.

hash(‘AAA’) = 0xfe7f3cba, hash(‘AAA’) & 0x3 = 2
hash(‘BBB’) = 0x87e3287b, hash(‘BBB’) & 0x3 = 3
hash(‘CCC’) = 0x194bcedf, hash(‘CCC’) & 0x3 = 3

‘BBB’ and ‘CCC’ collided: they both want to be in bucket 3.

To insert, a chaining hash table just add each object to the computed
bucket’s chain:

 +—+
 | 0 |
 +—+
 | 1 |
 +—+
 | 2 |->(‘AAA’, ‘Aardvark’)
 +—+
 | 3 |->(‘BBB’, ‘Bumblebee’)->(‘CCC’, ‘Cat’)
 +—+

When looking up key ‘CCC’ during the “probe” phase of a hash join,
we’ll again compute hash(‘CCC’) & 0x3 = 3, look in bucket 3, and then
compare the key of every tuple in that list with ‘CCC’ to see if we
can find any matches.  That’s called “lookup” in the text you quoted.

It also mentions deletion, which is pretty much just lookup following
by removing the matching entry from the list, but that’s a general
comment about hash tables.  It doesn’t apply to their use in hash
joins: there is never a need to remove individual keys, we just build,
probe and destroy the whole table.  Another difference between general
purpose hash tables such as you might find in a typical programming
language standard library and the hash tables used to implement hash
joins is the latter need to be able to tolerate duplicate keys, so the
‘scan’ of a bucket doesn’t give up as soon as it finds a match (unless
it’s a semi-join): it normally has to emit all of the matches.

PostgreSQL uses chaining for hash joins, but it also uses Robin Hood
hash tables in some other places, including hash-based GROUP BY.


So, from this, we can get the point of hashing. Hash function will means will convert/map all the input keys into digits. and also allocated to specific bucket/slot. Due to the collision issue, one bucket, there is very much common case it will map two input value, so people invented Hash Chaining.