Fast collection look-ups
by ricardoz on Apr.30, 2009, under Performance, Tips
I recently had to load a bunch of objects into memory and then perform thousands of look-ups over that collection. Using the good old java.util.ArrayList just didn’t cut it, the contains() function is extremely slow (as you would guess of course since this implementation stores elements as they are inserted and without any aditional indexing structure).
After a quick research I found out that the best performing solution to my problem (at least of those I had at hand) was the java.util.TreeSet class. It’s insertion performance is almost as good as ArrayList and the look-up speed is extremely better (again, this is of course what one would suspect given that TreeSet uses a sorted tree based storing structure).
I wrote a very simple testing program to confirm this that performs a number of insertions and look-ups on Collection implementations and tested it with ArrayList and TreeSet, here are the results running 50000 insertions and then 50000 look-ups on each class:
Took 125 ms to insert 50000 to java.util.ArrayList Took 45630 ms to perform 50000 look-ups using java.util.ArrayList Took 250 ms to insert 50000 to java.util.TreeSet Took 16 ms to perform 50000 look-ups using java.util.TreeSet
Conclusion: remember to choose the apropriate collection implementation! If you will have a fair amount of look-ups its very advisable to use TreeSet instead of the popular ArrayList.
This is the code I used for testing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | import java.util.ArrayList; import java.util.Collection; import java.util.TreeSet; import org.apache.commons.lang.RandomStringUtils; /** * @author Ricardo Zuasti */ public class Test { public static void main(String[] args) { final int ITERATIONS = 50000; // ArrayList ArrayList arrayList = new ArrayList(); insert(arrayList, ITERATIONS); lookup(arrayList, ITERATIONS); // TreeSet TreeSet treeSet = new TreeSet(); insert(treeSet, ITERATIONS); lookup(treeSet, ITERATIONS); } private static void insert(Collection col, int qty){ long startStamp = System.currentTimeMillis(); for (int i=0; i<qty; i++){ col.add(RandomStringUtils.randomAlphanumeric(10)); } long time = System.currentTimeMillis() - startStamp; System.out.println("Took " + time + " ms to insert " + qty + " to " + col.getClass().getName()); } private static void lookup(Collection col, int qty){ long startStamp = System.currentTimeMillis(); for (int i=0; i<qty; i++){ col.contains("searchme"); } long time = System.currentTimeMillis() - startStamp; System.out.println("Took " + time + " ms to perform " + qty + " look-ups using " + col.getClass().getName()); } } |
October 15th, 2009 on 7:11 am
You’ve missed to put FastArrayList into fast mode after inserting! Read the documentation for more information.
Also, the FastArrayList implementation is used in multithreaded environments.
October 15th, 2009 on 7:49 am
@nouseforaname: You are right about FastArrayList being targeted for multi-threaded environments, I’ve removed it from the test since it doesn’t contribute to the point I tried to make.
Regarding fast mode, it doesn’t make a difference if you use the class from a single thread (like we are in a straight loop).
thanks for the feedback!
February 4th, 2010 on 7:42 am
What if using a HashSet?
http://java.sun.com/developer/JDCTechTips/2002/tt1105.html
February 4th, 2010 on 7:45 am
It certainly is another option, I’ll re-run the tests with it and post the results
September 15th, 2010 on 3:20 am
You are right on choosing the right collection, but if you really don’t need the sorting capabilities of TreeSet, you can use HashSet.
And synchronicing them is easy, just wrap it using Collections.synchronizedSet(new HashSet(10)).
Another really important performance note is to choose the correct sorting method, for quicksort is not allways the good answer.
May 9th, 2011 on 11:40 pm
I just tested those with Java 6… Here’s my results to run those tests with a larger dictionary. My requirement is load-time is not important, I will not be inserting, just looking up for words in a dictionary. TreeSet amortizes the cost of looking-up when compared to inserting all the dictionary in memory.
// Inserting… Took 103 ms to insert 178691 to java.util.HashSet
// Looking-up… Took 204 ms to perform 178691 look-ups using java.util.HashSet
// Inserting… Took 507 ms to insert 178691 to java.util.TreeSet
// Looking-up… Took 87 ms to perform 178691 look-ups using java.util.TreeSet <<– look-up amortizes insertion
Thanks for the post
Marcello