Undergrad and colleagues accidentally shred 40-year hash table gospel
- Reference: 1739444412
- News link: https://www.theregister.co.uk/2025/02/13/hash_table_breakthrough/
- Source link:
That assumption refers to hash tables, and a conjecture based on work from the 1980s regarding the optimal way to store and query the data in them. The student, formerly of Rutgers University in New Jersey, came up with a new kind of hash table that is faster and uses fewer steps to find specific elements, all while being unaware of that conjecture.
As detailed by [1]Quanta Magazine , Andrew Krapivin, now a graduate at the University of Cambridge, is one of the co-authors on a paper, " [2]Optimal Bounds for Open Addressing Without Reordering ," published last month that sets out how his hash table can find elements faster than was previously considered possible.
[3]
Hash tables have been around since the 1950s, and are an example of a key-value store where a hash function is used to generate the index for the data value based on the key itself.
[4]
[5]
Previously, a historic paper authored by computer scientist Andrew Yao, " [6]Uniform hashing is optimal ," had asserted that the best way of finding an individual element or an empty location in a hash table is simply to access potential locations randomly, an approach known as uniform probing.
The new paper states that despite its simplicity, Yao's conjecture had never been settled.
[7]
There was a way to get around this involving the insertion algorithm carrying out a reordering process after insertion, i.e. to optimize the placement of elements within the hash table. But it wasn't clear if this was a necessary step in order to speed things up.
[8]Google exec sees enterprise quantum app on closer horizon
[9]Tiny Linux kernel tweak could cut datacenter power use by 30%, boffins say
[10]Abstract, theoretical computing qualifications are turning teens off
[11]BASIC co-creator Thomas Kurtz hits END at 96
The 2025 paper claims that even without reordering elements over time, it is possible to construct a hash table using Krapivin's method that achieves far better probe complexity – the average number of locations that need to be checked (probed) to find a specific key – than previous hash table methods.
The authors of the paper say that they refer to their insertion strategy as "elastic hashing" because of the way that the algorithm often probes much further down the table before snapping back to the position it ends up using.
According to Quanta, the paper demonstrates that for Krapivin's hash table method, the time required for worst-case queries and insertions is proportional to (log x) 2 , which is much faster than the previously assumed linear time complexity in x. Here, x is a number that describes how close the hash table is to being completely full, where x = 100 means the table is 99 percent full, and x = 1,000 means the table is 99.9 percent full.
Krapivin is said to have come up with this method after reading a paper titled " [12]Tiny Pointers ," co-authored by his professor at Rutgers, and exploring how to further miniaturize pointers so they used even less memory space. ®
Get our [13]Tech Resources
[1] https://www.quantamagazine.org/undergraduate-upends-a-40-year-old-data-science-conjecture-20250210/
[2] https://arxiv.org/abs/2501.02305
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offbeat/science&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2Z64lMkx1tDYrMVKhYc4wZQAAARM&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offbeat/science&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Z64lMkx1tDYrMVKhYc4wZQAAARM&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offbeat/science&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33Z64lMkx1tDYrMVKhYc4wZQAAARM&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[6] https://dl.acm.org/doi/pdf/10.1145/3828.3836
[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offbeat/science&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Z64lMkx1tDYrMVKhYc4wZQAAARM&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[8] https://www.theregister.com/2025/02/06/google_quantum_apps/
[9] https://www.theregister.com/2025/01/29/linux_kernel_tweak/
[10] https://www.theregister.com/2024/11/28/bcs_computer_science_gcse/
[11] https://www.theregister.com/2024/11/20/rip_thomas_kurtz/
[12] https://arxiv.org/abs/2111.12800
[13] https://whitepapers.theregister.com/
Re: Well Done.
Of course my interest is business sensitive, so I can't show my handle.
If my employer found out I'm interested in seeing how this filters through related areas of knowledge I'd be for the chop!
But the value I've given to all you readers, knowing that there is a poster here who thinks this is interesting, that's what I care about!!!!
Re: Well Done.
My post was a 'passing' comment, not intended to rival an in depth PHD level exposition !!!
Wouldn't it been much easier to read my post and say to yourself ... Meh !!!
You found my post of no value ... fine that is OK with me, yet you expended so much effort to tell everyone !!!???
Something regarding 'kettles' and 'black' comes to mind or does 'your' post have much more value than mine because 'you' posted it.
People who think that their every thought NEEDS to be posted ...
Including when they simply think 'Meh ... I don't care !!!' ... <=== This is what we ALL need to know.
My mistake ... how silly of me to think otherwise.
:)
Re: Well Done.
When I was at Uni, I discovered a new kind of hash, too.
Sadly, I have none of it left..
Re: Well Done.
Since at least one person didn't get it, I should explain that my subtle joke with the icon is one that I stole from a Private Eye cartoon..
Sherlock Holmes (rifling through his desk drawers): "Watson! My Opium and Hashish drawers are empty!"
Dr. Watson: "No 'shit', Sherlock?"
Reminds me of how best to fill a cabin
Certainly an interesting article and the details of the pointers are key, with databases the potential big benefactors. Of course, for any real world scenario, you also have to factor in any overhead associated with non-random strategy.
Re: Reminds me of how best to fill a cabin
Charle Clark,
Careful DON'T post your opinion the OP ^^^^ may object because it does not 'interest him or her' !!!
N.B.
It is a crime to post as AC yet the objection is from an AC ... Huh !!!???
:)
Re: Reminds me of how best to fill a cabin
Take your meds. You'll do better. You really will.
Dicts?
I wonder if this could be applied to speed up Python dicts and sets, as they use hashes?
Re: Dicts?
Hashes are everywhere. Whether or not this discovery is generally applicable, we shall see. I still have yet to read the paper but it does sound quite interesting.
Well Done.
Very clever.
It will be interesting to see how this filters through related areas of knowledge.
:)