
A HashSet is a high-performance, unordered collection of unique elements.
Delphi 12 introduces THashSet<T> and TObjectHashSet<T> classes in System.Generics.Collections.
Of course, Delphi has had other options for this kind of thing.
- TStringList with its fairly quick binary search.
- THashedStringList, which was created specifically for TMemIniFile, has improved read times and better scaling.
- TDictionary, which uses hashes for keys and has significantly faster read and write times than the other two. Embarcadero (eventually) replaced the THashedStringList in TMemIniFile with an internal-only wrapper around TDictionary.
Thomas Mueller did a great writeup on these, including some benchmarks.
One of the benefits to having a dedicated THashSet instead of using a TDictionary and ignoring things like values is clarity. Code that's easier to read is also easier (and cheaper) to maintain.
For example, in a discussion about how we might take advantage of Delphi 12's new GraphViz feature, someone from Embarcadero posted this interesting one-liner:
TDigraph = class(TObjectDictionary<string, THashSet<string>>);
It struck me as a nice clean way to represent the unit dependency graph generated by the compiler. The intent is clear and it's simpler than what I was doing. I really need to step up my game.
THashSet is one of the many incremental improvements in this release.
Disclaimer:
"This blog post is based on a pre-release version of the RAD Studio software and it has been written with specific permission by Embarcadero. No feature is committed until the product GA release."
FWIW the image you are using does not match the implementation of the hash table in Delphi. The RTL does not use a linked list or array to solve collisions but uses linear probing.
ReplyDelete@Stefan, Yeah, I know. I just "borrowed" something vaguely related.
ReplyDelete