java mini talks at qcon

This is part of my live blogging from QCon 2015. See my QCon table of contents for other posts.

This session is four 10 minute talks.

Determininistic testing in a non-deterministic world

  • determinism – everything has a cause and effect
  • pseudorandom – algorihtm that generates approximately random #s

Should see LocalDateTime with Clock instance to reproduce results in a program. There is a fixed clock so all operations in program see the exam exact time. Similar to using a random seed for generating numbers.

Hash spreads and probe functions: a choice of performance and consistency
primitive collections faster/less memory than boxed implementations. Uses 56 bytes for each Integer.

  • hash spread – function that destroys simple patterns in input data while presuming maximal info abobut input. Goal is to avoid collisions without spending too much time hashing.
  • hash probe – function to determining order of slots that span array. For example a linear probe goes down one slot if collision. A quadratic probe goes down down further if collision

Typesafe config on steroids
Property files are hard to scale. Apache Commons adds typing, but still limits to property file format and limit composition. Spring helps with scaling property file.

Typesafe Config – library used by Play and Akaa Standalone project without depenencies so can use in Java. JSON like format called HOCON (human optimized config object notation)

Scopes – library built on top of typesafe config

Real time distributed event driven computing at Credit Suisse
Credit Suisse produced own language that they call “data algebra”. Looks like a DSL.

java 8 stream performance – maurice naftalin – qcon

This is part of my live blogging from QCon 2015. See my QCon table of contents for other posts.

See http://www.lambdafaq.org

Background
He started with background on streams. (This is old news by now, but still taking some notes). The goals were to bring a functional style to Java and “explicit but unobtrusive” hardware parallelism. The former is more important than performance.

The intention is to replace loops with aggregate operations. [I like that he picked an example that required three operations and not an oversimplified example]. More concise/readable. Easy to change to parllelize.

Reduction == terminal operation == sink

Performance Notes
Free lunch is over. Chips don’t magically get faster over time. Intead, add core. The goal of parallel streamsisfor the intermediate operations in parallel and then bringing them together in reduction.

What to measure?

  • We want to know how code changes affect system performance in prod. Not feasible though because would need to do a controlled eperiment in prod conditions. Instead, we do a controlled experiment in lab conditions and hope not answering a simplified question.
  • Hard to microbenchmark because of inaccuracy, garbage collection, optimization over time, etc. There are benchmarking libraries – Caliper or JMH. [or better if don’t need to microbenchmark]
  • Don’t optimize code if don’t have a problem. What’s your performance requirement? [and is it the bottleneck]. Similarly don’t optimize the OS or the problem lies somewhere else.

Case study
This was a live demo. First we saw that not using BufferedReader makes a file slow to read. [not about streams]. Then we watched my JMeter didn’t work on the first try. [the danger of a live demo]. Then he showed how messing with the GC size and making it too small is bad for performance as well [still not on streams]. He is trying to shw the process of perofrmance tuning overall. Which is valid info. Just not what I expected this session to be about.

Then [after I didn’t see the stream logic being a problem in th first plae], he showe how to solve subproblems and merge them.[oddly not calling it map reduce]

8 minutes before the end of the talk, we finally see the non-parallel code for the case study. It’s interesting code becauase it uses two terminal operations and two streams. At least reading in the file is done normally. Finally, we see that the combiner is O(n) which prevents speeding it up.

Some rules

  • The workload of the intermedidate operations must be great enough to outweith the overheads. Often quoted as size of data set * processing cost per element
  • sorted() is worse
  • Collectors cost extra. toMap*( merging maps is slow. toList, toSet() is dominated by the accumulator.
  • In the real world, the fork/join pool doesn’t operate in isolation

My impressions: A large amount of this presentation wasn’t stream performance. Then the case study shows that reading without a BufferedReader is slow. [no kidding]. I feel like the example was contrived and we “learned” that poorly written code behaves poorly. I was hopingthe talk would actually be about parallelization. When parallelStream() saves time and when it doesn’t for example. What I learned was for this particular scenario, parallelization wasn’t helpful. And then right at the end, the generic rules. Which felt rushed and thown at us.

qcon – live blog table of contents

I’m attending QCon New York which is run by InfoQ.com. At the end, I’ll update this post to be a table of contents of my blog posts from the conference.

My live blog posts

Wednesday

Thursday

Friday

That’s 9742 words live blogged not counting this post (which gets it to 10K) and an average blog post size of 487. The “Too Big To Fail” session was an outlier at 827; must have liked it a lot.

My overall impressions
The conference in general seem set up well with 25 minutes between talks along with an open space by area at the end of the day (not presentations; discussions). For lunch they have tables designed for discussion – large normal confernece tables, 4 people discussion tables and “loner” tables. I also like the intro about usbility including the big names on the badge.

The intro also had each track lead give an overview of th talks in their track. This felt like overkill as this was online and most people think about what they want to attend before showing up.

Logistically, I really like that you gave feedback by putting a green, yellow or red paper as you walk out the door of the session. Low overhead; low time commitment and asked while you still remember the details.