performance tuning selenium – firefox vs chrome vs headless

I’m the co-volunteer coordinator for NYC FIRST. Every year we are faced with a problem: we want to export the volunteer data including preferences for offseason events. The system provides an export feature but does not include a few fields we want. A few years ago, my friend Norm said “if only we could export those fields.” I’m a programmer; of course we can!

So I wrote him a program to do just this. It’s export-vol-data at Github. And fittingly, he “paid” me with free candy from the NYC FIRST office. Once a year we meet, Norm gives his credentials to the program and we wait. And wait. And wait. This year NYC FIRST had more events than ever before so it took a really long time. I wanted to tune it.

Getting test data

The problems with tuning have been:

  1. I have no control over when people volunteer for the event. It’s hard to performance test when the data set keeps changing.
  2. The time period when I have access to the event is not the time period that I have the most free time.

Norm solved these problems by creating a test event for me. I started over the summer, but then got accepted to speak at JavaOne and was really busy getting ready for that. Then I went back to it and someone deleted my test event. Norm solved that problem by creating a new event called “TEST EVENT FOR SOFTWARE DEVELOPMENT – DO NOT ENROLL OR DELETE, please. – FLL”. And one person did volunteer for that. But not a lot so it helped.

Performance tuning

I tried the following performance improvements based on our experience exporting in April 2017.

  1. SUCCESS: Run the program on the largest events first. (It’s feasible to manually export the data for small events. Plus those have largely people who also volunteered at a larger event.) This allows us to run for the events with the most business value first. It also allows us to abort the program at any time.
  2. SUCCESS: Skip events and roles with zero volunteers. For some reason, it takes a lot longer to load a page with no volunteers. So skipping this makes the program MUCH faster.
  3. SKIP: Add parallelization. I wound up not doing this because the program is so fast now.
  4. FAILED: Switch from Firefox driver to PhantomJS. I knew the site didn’t function with HtmlUnitDriver. I thought maybe it would work with PhantomJS – an in memory driver with better JavaScript support. Alas it didn’t.
  5. FAILED: Try to go directly to URLs with data. FIRST prevents this from working. You can’t simply simulate the REST calls externally.
  6. SUCCESS: Switch from  Firefox driver to Chrome driver. This made a huge difference in both performance and stability. The program would crash periodically in Firefox. I was never able to figure out why. I have retry/resume logic, but having to manually click “continue” makes it slower.
  7. UNKNOWN: I added support for Headless Chrome in the program. It doesn’t seem noticeably faster though. And it is fun for Norm and I to watch the program “click” through the site. So I left it as an option, but not the default.

Results

Like any good programming exercise, some things worked and some didn’t.  The program is an order of magnitude faster now that at the start though so I declare this a success!

java 8 stream performance – maurice naftalin – qcon

This is part of my live blogging from QCon 2015. See my QCon table of contents for other posts.

See http://www.lambdafaq.org

Background
He started with background on streams. (This is old news by now, but still taking some notes). The goals were to bring a functional style to Java and “explicit but unobtrusive” hardware parallelism. The former is more important than performance.

The intention is to replace loops with aggregate operations. [I like that he picked an example that required three operations and not an oversimplified example]. More concise/readable. Easy to change to parllelize.

Reduction == terminal operation == sink

Performance Notes
Free lunch is over. Chips don’t magically get faster over time. Intead, add core. The goal of parallel streamsisfor the intermediate operations in parallel and then bringing them together in reduction.

What to measure?

  • We want to know how code changes affect system performance in prod. Not feasible though because would need to do a controlled eperiment in prod conditions. Instead, we do a controlled experiment in lab conditions and hope not answering a simplified question.
  • Hard to microbenchmark because of inaccuracy, garbage collection, optimization over time, etc. There are benchmarking libraries – Caliper or JMH. [or better if don’t need to microbenchmark]
  • Don’t optimize code if don’t have a problem. What’s your performance requirement? [and is it the bottleneck]. Similarly don’t optimize the OS or the problem lies somewhere else.

Case study
This was a live demo. First we saw that not using BufferedReader makes a file slow to read. [not about streams]. Then we watched my JMeter didn’t work on the first try. [the danger of a live demo]. Then he showed how messing with the GC size and making it too small is bad for performance as well [still not on streams]. He is trying to shw the process of perofrmance tuning overall. Which is valid info. Just not what I expected this session to be about.

Then [after I didn’t see the stream logic being a problem in th first plae], he showe how to solve subproblems and merge them.[oddly not calling it map reduce]

8 minutes before the end of the talk, we finally see the non-parallel code for the case study. It’s interesting code becauase it uses two terminal operations and two streams. At least reading in the file is done normally. Finally, we see that the combiner is O(n) which prevents speeding it up.

Some rules

  • The workload of the intermedidate operations must be great enough to outweith the overheads. Often quoted as size of data set * processing cost per element
  • sorted() is worse
  • Collectors cost extra. toMap*( merging maps is slow. toList, toSet() is dominated by the accumulator.
  • In the real world, the fork/join pool doesn’t operate in isolation

My impressions: A large amount of this presentation wasn’t stream performance. Then the case study shows that reading without a BufferedReader is slow. [no kidding]. I feel like the example was contrived and we “learned” that poorly written code behaves poorly. I was hopingthe talk would actually be about parallelization. When parallelStream() saves time and when it doesn’t for example. What I learned was for this particular scenario, parallelization wasn’t helpful. And then right at the end, the generic rules. Which felt rushed and thown at us.

Eclipse Memory Analyzer

Last time CodeRanch has a memory leak, a teammate ran Eclipse Memory Analyzer and I ran JVisual VM.  This time, I did both.  I took the heap dump as described here.  JVisual VM told me hibernate was using a ton of memory.

To run

To run Eclipse Memory Analyzer, I needed to launch Eclipse with more memory.  The default wasn’t enough to open the heap dump.  On Windows, I would edit the eclipse.ini file.  On Mac, I instead used a command line.  (I have read that it is supposed to be possible to edit the eclipse.ini too.  Didn’t work the once I tried it.)

./eclipse -vmargs -Xmx4g -XX:-UseGCOverheadLimit

(It was a production dump; just under 1 GB.)

What I learned

The Leak Suspects report was right on. It noted that the heap dump had three large Hibernate Session objects.  I was only expecting one.  We are using Entity Factory (which use Hibernate Session behind the scenes if using Hibernate JPA).  It turned out there was some code which intended to to cache the entity factory, but in fact didn’t for some nightly jobs we have.