Fast collection look-ups
by ricardoz on Apr.30, 2009, under Performance, Tips
I recently had to load a bunch of objects into memory and then perform thousands of look-ups over that collection. Using the good old java.util.ArrayList just didn’t cut it, the contains() function is extremely slow (as you would guess of course since this implementation stores elements as they are inserted and without any aditional indexing structure).
After a quick research I found out that the best performing solution to my problem (at least of those I had at hand) was the java.util.TreeSet class. It’s insertion performance is almost as good as ArrayList and the look-up speed is extremely better (again, this is of course what one would suspect given that TreeSet uses a sorted tree based storing structure).
I wrote a very simple testing program to confirm this that performs a number of insertions and look-ups on Collection implementations and tested it with ArrayList and TreeSet, here are the results running 50000 insertions and then 50000 look-ups on each class:
Took 125 ms to insert 50000 to java.util.ArrayList Took 45630 ms to perform 50000 look-ups using java.util.ArrayList Took 250 ms to insert 50000 to java.util.TreeSet Took 16 ms to perform 50000 look-ups using java.util.TreeSet
Conclusion: remember to choose the apropriate collection implementation! If you will have a fair amount of look-ups its very advisable to use TreeSet instead of the popular ArrayList.
This is the code I used for testing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | import java.util.ArrayList; import java.util.Collection; import java.util.TreeSet; import org.apache.commons.lang.RandomStringUtils; /** * @author Ricardo Zuasti */ public class Test { public static void main(String[] args) { final int ITERATIONS = 50000; // ArrayList ArrayList arrayList = new ArrayList(); insert(arrayList, ITERATIONS); lookup(arrayList, ITERATIONS); // TreeSet TreeSet treeSet = new TreeSet(); insert(treeSet, ITERATIONS); lookup(treeSet, ITERATIONS); } private static void insert(Collection col, int qty){ long startStamp = System.currentTimeMillis(); for (int i=0; i<qty; i++){ col.add(RandomStringUtils.randomAlphanumeric(10)); } long time = System.currentTimeMillis() - startStamp; System.out.println("Took " + time + " ms to insert " + qty + " to " + col.getClass().getName()); } private static void lookup(Collection col, int qty){ long startStamp = System.currentTimeMillis(); for (int i=0; i<qty; i++){ col.contains("searchme"); } long time = System.currentTimeMillis() - startStamp; System.out.println("Took " + time + " ms to perform " + qty + " look-ups using " + col.getClass().getName()); } } |
October 15th, 2009 on 7:11 am
You’ve missed to put FastArrayList into fast mode after inserting! Read the documentation for more information.
Also, the FastArrayList implementation is used in multithreaded environments.
October 15th, 2009 on 7:49 am
@nouseforaname: You are right about FastArrayList being targeted for multi-threaded environments, I’ve removed it from the test since it doesn’t contribute to the point I tried to make.
Regarding fast mode, it doesn’t make a difference if you use the class from a single thread (like we are in a straight loop).
thanks for the feedback!
February 4th, 2010 on 7:42 am
What if using a HashSet?
http://java.sun.com/developer/JDCTechTips/2002/tt1105.html
February 4th, 2010 on 7:45 am
It certainly is another option, I’ll re-run the tests with it and post the results