In this chapter, we look at certain data structures and algorithms that are not available for you in the Framework Class Library (FCL) through Version 1.1. Examples are provided for algorithms like hash code creation and string balancing. The FCL does not support every data structure you might need, so this chapter provides solutions for priority and double queues, binary and n-ary trees, sets, and a multimap, as well as many other things.
You have created a class or structure
that will be used as a key in a Hashtable
. You
need to overload the GetHashCode
method in order
to return a good distribution of hash values to be used in a
Hashtable
(the Discussion section defines a good
distribution of hash values). You also need to choose the best hash
code algorithm to use in the GetHashCode
method of
your object.
The following procedures implement hash code algorithms and can be
used to override the GetHashCode
method. Included
in the discussion of each method are the pros and cons of using it,
as well as why you would want to use one instead of another.
In addition, it
is desirable, for performance reasons, to use the return value of the
GetHashCode
method to determine whether the data
contained within two objects is equal. Calling
GetHashCode
to return a hash value of two objects
and comparing their hash values can be faster than calling
Equals
, which
individually tests the equality of all pertinent data within two
objects. In fact, some developers even opt to compare hash code
values returned from GetHashCode
, within their
overloaded Equals
method.
This hash accepts a variable number of
integer values and XOR
s each value to obtain a
hash code. This simple algorithm has a good chance of producing an
adequate distribution and good performance. Remember to profile and
measure it to confirm that it works as well for your particular data
set. It fails when you need to integrate more than just numeric
values equal or smaller in size to an integer. Its code
is:
public int SimpleHash(params int[] values) { int hashCode = 0; if (values != null) { foreach (int val in values) { hashCode ^= val; } } return (hashCode); }
This hash allows you to integrate the
long
data type into a hash algorithm. It takes the
upper 32 bits of the long
value and folds them
over the lower 32 bits of this value. The actual process of folding
the two values is implemented by XOR
ing them and
using the result. Once again, this is a good performing algorithm
with good distribution properties, but, again, it fails when you need
to go beyond the long
data type. A sample
implementation is:
public int FoldingHash(params long[] values) { int hashCode = 0; if (values != null) { int tempLowerVal = 0; int tempUpperVal = 0; foreach (long val in values) { tempLowerVal = (int)(val & 0x000000007FFFFFFF); tempUpperVal = (int)((val >> 32) & 0xFFFFFFFF); hashCode^= tempLowerVal ^ tempUpperVal; } } return (hashCode); }
This hash obtains the hash codes from a
variable number of object
types. The only types
that should be passed in to this method are reference type fields
contained within your object. This method XOR
s all
the values returned by the GetHashCode
method of
each object. Its source code is:
public int ContainedObjHash(params object[] values) { int hashCode = 0; if (values != null) { foreach (int val in values) { hashCode ^= val.GetHashCode( ); } } return (hashCode); }
Potentially the best method of
obtaining a hash value for an object is to use the hashing classes
built in to the FCL. The CryptoHash
method returns
a hash value for some input using the MACTripleDES
class. This method returns a very good distribution for the hash
value, although you may pay for it in performance. If you do not
require a near perfect hash value and are looking for an excellent
distribution, consider using this approach to calculate a hash
value:
public int CryptoHash(string strValue) { int hashCode = 0; if (strValue != null) { byte[] encodedUnHashedString = Encoding.Unicode.GetBytes(strValue); // Replace the following Key with your own // key value byte[] Key = new byte[16] {1,122,3,11,65,7,9,45,42,98, 77,34,99,45,167,211}; MACTripleDES hashingObj = new MACTripleDES(Key); byte[] code = hashingObj.ComputeHash(encodedUnHashedString); // use the BitConverter class to take the // first 4 bytes and use them as an int for // the hash code hashCode = BitConverter.ToInt32(code,0); } return (hashCode); }
This method shows how other, nonstring
data types can be used with the built-in hashing classes to obtain a
hash code. This method converts a numeric value to a
string
and then to a byte
array. The array is then used to create the hash value using the
SHA256Managed
class. Finally, each value in the
byte
array is added together to obtain a hash
code. The code is:
public int CryptoHash(long intValue) { int hashCode = 0; byte[] encodedUnHashedString = Encoder.Unicode.GetBytes(intValue.ToString( )); SHA256Managed hashingObj = new SHA256Managed( ); byte[] code = hashingObj.ComputeHash(encodedUnHashedString); // use the BitConverter class to take the // first 4 bytes and use them as an int for // the hash code hashCode = BitConverter.ToInt32(code,0); return (hashCode); }
This method uses each character in the
input string, strValue
, to determine a hash value.
This algorithm produces a good distribution of hash codes even when
this method is fed similar strings. However, this method will break
down when long strings that end with the same characters are passed.
While this may not happen many times with your applications, it is
something to be aware of. If performance is critical, this is an
excellent method to use. Its code
is:
public int ShiftAndAddHash (string strValue) { int hashCode = 0; long workHashCode = 0; if (strValue != null) { for (int counter=0; counter<strValue.Length; counter++) { workHashCode = (workHashCode << (counter % 4)) + (int)strValue[counter]; } workHashCode = workHashCode % (127); } hashCode = (int)workHashCode; return (hashCode); }
This method is a rather widely accepted method of creating a good hash value that accepts several different data types and uses a different algorithm to compute the hash value for each. It calculates the hash code as follows:
It assigns an arbitrary odd primary number to the
HashCode
variable. This variable will eventually
contain the final hash code. Good primary numbers to use are
3
, 5
, 7
,
11
, 13
, 17
,
19
, 23
, 29
,
31
, 37
, 41
,
43
, 47
, 53
,
59
, 61
, or
67
. Obviously, others exist beyond this set, but
this should give you a good starting point.
For numeric types equal to or less than the size of an
int
and char
data types, it
multiplies the current HashCode
by the primary
number selected and then adds to this value the value of the numeric
type cast to an integer.
For numeric types greater than the size of an int
,
it multiplies the current HashCode
by the primary
number selected and then adds to this the folded version of this
numeric value. (For more information on folding, see The folding hash method earlier in this
recipe.)
For char
, floating point, or
decimal
data types, it multiplies the current
HashCode
by the primary number selected, casts the
numeric value to an integer, and then uses the folding method to
calculate its value.
For bool
data types, it multiplies the current
HashCode
by the primary number selected and then
adds a 1
for true
and
0
for false
(you can reverse
this behavior if you wish).
For object
data types, it multiplies the current
HashCode
by the primary number selected and then
adds the return value of GetHashCode
called on
this object. If an object is set to null
, use the
value 0
in your calculations.
For an array or collection, it determines the contained type(s) and
uses each element of the array or collection to calculate the hash
value, as follows (in the case of an integer array named
MyArray)
:
foreach (int element in myArray) { hashCode = (hashCode * 31) + element; }
This algorithm will produce a good distributed hash code for your
object and has the added benefit of the flexibility to employ any
type of data type. This is a high performing algorithm for simple,
moderately complex, and even many complex objects. However, for
extremely complex objects—ones that contain many large arrays,
large Hashtables
, or other objects that use a
slower hash code algorithm—this algorithm will start performing
badly. In this extreme case, you may want to consider switching to
another hash code algorithm to speed performance or simply paring
down the amount of fields used in the calculation. Be careful if you
choose this second method to increase performance; you could
inadvertently cause the algorithm to produce similar values for
differing objects. The code for the calculated hash method
is:
public int CalcHash(short someShort, int someInt, long someLong, float someFloat, object someObject) { int hashCode = 7; hashCode = hashCode * 31 + (int)someShort; hashCode = hashCode * 31 + someInt; hashCode = hashCode * 31 + (int)(someLong ^ (someLong >> 32)); long someFloatToLong = (long)someFloat; hashCode = hashCode * 31 + (int)(someFloatToLong ^ (someFloatToLong >> 32)); if (someObject != null) { hashCode = hashCode * 31 + someObject.GetHashCode( ); } return (hashCode); }
This technique converts its input into a
string, and then uses that string’s
GetHashCode
method to automatically generate a
hash code for an object. It accepts an integer array, but you could
substitute any type that can be converted into a string. You could
also use several different types of arguments as input to this
method. This method iterates through each integer in the array passed
as an argument to the method. The ToString
method
is called on each value to return a string. The
ToString
method of an int
data
type returns the value contained in that int
. Each
string value is appended to the string variable
HashString
. Finally, the
GetHashCode
method is called on the
HashString
variable to return a suitable hash
code.
This method is simple and efficient, but it does not work well with
objects that have not overridden the ToString
method to return something other than their data type. It may be best
to simply call the GetHashCode
method on each of
these objects individually. You should use your own judgment and the
rules found in this recipe to make your decision:
public int ConcatStringGetHashCode(int[] someIntArray) { int hashCode = 0 StringBuilder hashString = new StringBuilder( ); if (someIntArray != null) { foreach (int i in someIntArray) { hashString.Append(i.ToString( ) + "^"); } } hashCode = hashString.GetHashCode( ); return (hashCode); }
The following using
directives must be added to
any file containing this code:
using System; using System.Text; using System.Security.Cryptography;
The GetHashCode
method is called when you are
using an instance of this class as the key in a
Hashtable
object. Whenever your object is added to
a Hashtable
as a key, the
GetHashCode
method is called on your object to
obtain a hash code to be used by the Hashtable
. A
hash code is also obtained from your object when a search is
performed for your object in the Hashtable
.
The following class implements the SimpleHash
algorithm for the overloaded GetHashCode
method:
public class SimpleClass { private int x = 0; private int y = 0; public override int GetHashCode( ) { return(SimpleHash(x, y)); } public int SimpleHash(params int[] values) { int hashCode = 0; if (values != null) { foreach (int val in values) { hashCode ^= val; } } return (hashCode); } }
This class could then be used as a key in a
Hashtable
through the following code:
SimpleClass simpleClass = new SimpleClass( ); Hashtable hashTable = new Hashtable( ); hashTable.Add(simpleClass, 100);
There are several rules for writing a good
GetHashCode
method and a good hash code algorithm:
This method should return the same value for two different objects that have value equality. Value equality means that two objects contain the same data.
The hash algorithm should return a good distribution of values for
the best performance in a Hashtable
. A good
distribution of values means that the hash values returned by the
GetHashCode
method are usually different for
objects of the same type, unless those objects have value equality.
Note that objects containing very similar data should also return a
unique hash value. This distribution allows the
Hashtable
to work more efficiently and thus
perform better.
This method should not throw an exception.
Both the
Equals
method and GetHashCode
method should be overridden together.
The GetHashCode
method should compute the hash
code using the exact set of variables that the overridden
Equals
method uses when calculating equality.
The hash algorithm should be as fast as possible to speed up the
process of adding and searching for keys in a
Hashtable
.
When creating structures, it is always best to override the
GetHashCode
method if you think this hash code
will ever be used, since the ValueType.GetHashCode
method is slow. (ValueType.GetHashCode
makes use
of reflection to obtain a hash value from a type’s
internal fields.)
Unless the base class is object
, use the base
class’s GetHashCode
value when
calculating the hash code.
Use the GetHashCode
values of any contained
objects when calculating the hash code of the parent object.
Use the GetHashCode
values of all elements of an
array when calculating the array’s hash code.
The System.Int32
,
System.UInt32
, and
System.IntPtr
data types in the FCL use an
additional hash code algorithm not covered in the Solution section.
Basically, these data types return the value that they contain as a
hash code. Most likely, your objects will not be so simple as to
contain a single numeric value, but if they are, this method works
extremely well.
You may also want to combine specific algorithms to suit your
purposes. For instance, if your object contains one or more
string
types and one or more
long
data types, you could combine the
ContainedObjHash
method and the
FoldingHash
method to create a hash value for your
object. The return values from each method could either be added or
XOR
ed together.
Once an object is in use as a key in a Hashtable
,
it should never return a different value for the hash code.
Originally, it was documented that hash codes must be immutable, as
the authors of Hashtable
thought that this should
be dealt with by whomever writes GetHashCode
. It
doesn’t take much thought to realize that for
mutable types, if you require both that the hashcode never changes,
and that Equals
represents
the equality of the mutable objects, and that if
a.Equals(b)
, then a.GetHashCode( )
==
b.GetHashCode( )
, then the only possible value implementation of
GetHashCode
is one that returns the same integer
constant for all values.
The GetHashCode
method is called when you are
using this object as the key in a Hashtable
object. Whenever your object is added to a
Hashtable
as a key, the
GetHashCode
method is called on your object to
obtain a hash code. This hash code must not change while your object
is a key in the Hashtable
. If it does, the
Hashtable
will not be able to find your object.
The GetHashCode
method is called when you are
using this object as the key in a Hashtable
object. Whenever your object is added to a
Hashtable
as a key, the
GetHashCode
method is called on your object to
obtain a hash code. This hash code must not change while your object
is a key in the Hashtable
. If it does, the
Hashtable
will not be able to find your object.
From the perspective of memory consumption, this object will not be
able to have its memory freed until the Hashtable
is collected causing a noteworthy degree of memory retention.