Update June 4, 2009
The original post favored Dictionary as the fastest data structure both for add and contains. After some debate and some re-runs of my test code, I found the result to be wrong. HashSet is faster. Probably the results were affected by the workload on my computer. I ran one List test, then a Dictionary test and finally a HashSet test - I should have run them multiple times and at the same workload (i.e. no programs or processes running). Anyway, on to the article.
Still working with high-volume realtime datafeeds, I'm struggling to understand where the bottleneck in my code is. It's not with the database, it's not with the network - it's somewhere in the code. Now that doesn't help a lot.
So I decided to have a look at the different data structures that could be usable for my needs. Right now I'm using a List to keep track of my orders, and each time I get a new order, I check that list for the given ID.
So I'm starting to wonder, maybe there is a performance issue with the List data structure. So I made a small test with the following code:
static void TestList()
{
var watch = new Stopwatch();
var theList = new List<int>();
watch.Start();
//Fill it
for (int i = 0; i <>
{
theList.Add(i);
}
watch.Stop();
Console.WriteLine("Avg add time for List: {0}", (watch.Elapsed.TotalMilliseconds / noOfIterations));
watch.Reset();
watch.Start();
//Test containsKey
for (int j = 0; j <>
theList.Contains(j);
}
watch.Stop();
Console.WriteLine("Avg Contains lookup time for List: {0}", (watch.Elapsed.TotalMilliseconds / noOfIterations));
}
I created similar test code for Dictionary
First run, I used 1,000 for the noOfIterations variable and ran it three times.
The horizontal scale here is in milliseconds, so things are reasonably fast. A value of 0.2 gives you a possible 200 adds per second. As you probably can see without checking the numbers, dictionary is faster. List is slower for lookup, but HashSet suprises a little with the slow add function. So what happens when we go to 100,000 items?
OK, List is a lot slower for lookup still. Add seems to compare ok though. Let's see how it compares if we remove the lookup-time from the chart: This is wrong, see "The aftermath"
Now that's a result! Turns out List is your winner if fast adding is all you care about. If you want to look up your values later though, dictionary is your winner. Hope this will save some time for other developers in the same situation. Also feel free to comment if you find different results, or a problem with my test!
By popular request: The rest of the code!
static void TestDictionary()
{
var watch = new Stopwatch();
//Create Dictionary
var theDict = new Dictionary<int, int>();
watch.Start();
//Fill it
for (int i = 0; i <noofiterations;i++)
{
theDict.Add(i, i);
}
watch.Stop();
Console.WriteLine("Avg add time for Dictionary: {0}", (watch.Elapsed.TotalMilliseconds/noOfIterations));
watch.Reset();
//Test containsKey
watch.Start();
for (int j = 0; j <noofiterations;j++)
{
theDict.ContainsKey(j);
}
watch.Stop();
Console.WriteLine("Avg Contains lookup time for Dictionary: {0}",
(watch.Elapsed.TotalMilliseconds/noOfIterations));
}
static void TestHashSet()
{
var watch = new Stopwatch();
//Create List
var hashSet = new HashSet<int>();
//Fill it
watch.Start();
for (int i = 0; i <>
{
hashSet.Add(i);
}
watch.Stop();
Console.WriteLine("Avg add time for HashSet: {0}", (watch.Elapsed.TotalMilliseconds / noOfIterations));
watch.Reset();
//Test contains
watch.Start();
for (int j = 0; j <>
{
hashSet.Contains(j);
}
watch.Stop();
Console.WriteLine("Avg Contains lookup time for HashSet: {0}", (watch.Elapsed.TotalMilliseconds / noOfIterations));
}
The aftermath
Because of the controversy involved in HashSet being slower than Dictionary, despite having only one value and the Dictionary two, I re-ran the test on my computer, and on a colleagues. Instead of running the tests one by one, I ran three times HashTest and three times Dictionary, and picked the averages. The result you can see below, HashSet is faster than Dictionary both for Add and Contains methods.