how did we end up here – todd montgomery and tisha gee – qcon

This is part of my live blogging from QCon 2015. See my <a /2015/06/10/qcon/”>QCon table of contents</a> for other posts.

When new, we assume everone knows what they are doing and everything is logical/clean/organized. Yet, reality is messy. Sometimes get to the point where can’t touch anything. Sometimes get to the point where proud of over-engineered/terrible solutions.

Software process stats

  • agile and iterative have similar project success rate
  • Ad-hoc and waterfall are similar
  • Agile/iterative is only a little better
  • Projects with less than 10 people significantly better than twenty or more
  • Different reports mesure successful projects differently. 30%- 60%.
  • More expensive project not better “throwing more $ at projects don’t make it better” [wouldn’t this be because of size of project; not throwing $ at it?]

Enterprise architect == so good at job that no longer do development. Surgeons design systems not people who used to,

More organizations are looking at contributions to open source to prove can do development before start. No place to hide; mediocrity becomes visible. Open source is not a business plan, but can be a distribution model.

Agile

  • Minimal viable product – people don’t want to buy MVP. Why aim for it? [we don’t it is about doing that first so guaranteed to have a base]
  • Product owner – why can’t talk to real customer. Why need a proxy? Want direct feedback. Actually pairing with customer to see how use app brings up usabiity/process issues. Technology should be part of the business.
  • Poll: how many people doing agile release less than every 3 months? A few
  • Water-scrum-full – tiny waterful. Do scrum practices, but ignore retrospective and still waterfall. Should focus on learning, feedback cycles and outcomes
  • Projects often succeed because 1-2 people make it happen in spite of process. Find out by thinking “what would happen if you weren’t involved”

“Shared mutable state” should be scary. Should be for sysetms programming where domain is smaller and understand hardware. Otherwise, shouldn’t be taken for granted. Embrace append only, single writer and shared nothing designs.

Ambdal’s law says can only scale up to a certain # of CPUs before maxing out. Universal Scalability Law says it gests worse well before that. “Coherence penalty”

Simplicity is better

Text encoding (JSON, XML) doesn’t need to be human readable. It’s computers talking to each other

Have to deal with probems and lack of response for both synchronous and async communication

Errors need to be first class messages. Java Exceptions name imply they are unusual cases. Don’t know what to do with them anyway.

Abstractions shouldn’t mean generalization. hould be about creating a semantic level so can be more precise.

Issues: Superiority complex with different technologies/techniques and anyone who says otherwise is wrong. Religious wars. X is the solution to everything.

Functional programming is not the anwer to multi-core. The math (universal scalability law) still hits . [still helps up to a point though on the graph]

Think about transformation and flow of data; not code

Hardware

  • Mobile makes us think about hardware again – battery and bandwidth. Free lunch on throwning hardware at problem is over
  • Hardware has been designed to make bets on locality and predictable for access. Pre-fetching and the like make an order of magnitude difference.
  • Bandwidth is increasing. Latency is staying the same.

Diversity

  • Testosterone Driven Development
  • Increase in women went up with other STEM fields. Then in CS it started declining in 1980. This is when PCs were introduced at marketed to boys.
  • Can catch up in a year even if start coding in college.
  • Grace Hopper invented COBOL and the compiler
  • Margret Hamilton invented async programming via NASA

Don’t be afraid to fail

Impressions: great substitute keynote. I wonder how long they had to practicing together. Trisha said she’s seen it given before (with Martin Thompson)

java mini talks at qcon

This is part of my live blogging from QCon 2015. See my QCon table of contents for other posts.

This session is four 10 minute talks.

Determininistic testing in a non-deterministic world

  • determinism – everything has a cause and effect
  • pseudorandom – algorihtm that generates approximately random #s

Should see LocalDateTime with Clock instance to reproduce results in a program. There is a fixed clock so all operations in program see the exam exact time. Similar to using a random seed for generating numbers.

Hash spreads and probe functions: a choice of performance and consistency
primitive collections faster/less memory than boxed implementations. Uses 56 bytes for each Integer.

  • hash spread – function that destroys simple patterns in input data while presuming maximal info abobut input. Goal is to avoid collisions without spending too much time hashing.
  • hash probe – function to determining order of slots that span array. For example a linear probe goes down one slot if collision. A quadratic probe goes down down further if collision

Typesafe config on steroids
Property files are hard to scale. Apache Commons adds typing, but still limits to property file format and limit composition. Spring helps with scaling property file.

Typesafe Config – library used by Play and Akaa Standalone project without depenencies so can use in Java. JSON like format called HOCON (human optimized config object notation)

Scopes – library built on top of typesafe config

Real time distributed event driven computing at Credit Suisse
Credit Suisse produced own language that they call “data algebra”. Looks like a DSL.

java 8 stream performance – maurice naftalin – qcon

This is part of my live blogging from QCon 2015. See my QCon table of contents for other posts.

See http://www.lambdafaq.org

Background
He started with background on streams. (This is old news by now, but still taking some notes). The goals were to bring a functional style to Java and “explicit but unobtrusive” hardware parallelism. The former is more important than performance.

The intention is to replace loops with aggregate operations. [I like that he picked an example that required three operations and not an oversimplified example]. More concise/readable. Easy to change to parllelize.

Reduction == terminal operation == sink

Performance Notes
Free lunch is over. Chips don’t magically get faster over time. Intead, add core. The goal of parallel streamsisfor the intermediate operations in parallel and then bringing them together in reduction.

What to measure?

  • We want to know how code changes affect system performance in prod. Not feasible though because would need to do a controlled eperiment in prod conditions. Instead, we do a controlled experiment in lab conditions and hope not answering a simplified question.
  • Hard to microbenchmark because of inaccuracy, garbage collection, optimization over time, etc. There are benchmarking libraries – Caliper or JMH. [or better if don’t need to microbenchmark]
  • Don’t optimize code if don’t have a problem. What’s your performance requirement? [and is it the bottleneck]. Similarly don’t optimize the OS or the problem lies somewhere else.

Case study
This was a live demo. First we saw that not using BufferedReader makes a file slow to read. [not about streams]. Then we watched my JMeter didn’t work on the first try. [the danger of a live demo]. Then he showed how messing with the GC size and making it too small is bad for performance as well [still not on streams]. He is trying to shw the process of perofrmance tuning overall. Which is valid info. Just not what I expected this session to be about.

Then [after I didn’t see the stream logic being a problem in th first plae], he showe how to solve subproblems and merge them.[oddly not calling it map reduce]

8 minutes before the end of the talk, we finally see the non-parallel code for the case study. It’s interesting code becauase it uses two terminal operations and two streams. At least reading in the file is done normally. Finally, we see that the combiner is O(n) which prevents speeding it up.

Some rules

  • The workload of the intermedidate operations must be great enough to outweith the overheads. Often quoted as size of data set * processing cost per element
  • sorted() is worse
  • Collectors cost extra. toMap*( merging maps is slow. toList, toSet() is dominated by the accumulator.
  • In the real world, the fork/join pool doesn’t operate in isolation

My impressions: A large amount of this presentation wasn’t stream performance. Then the case study shows that reading without a BufferedReader is slow. [no kidding]. I feel like the example was contrived and we “learned” that poorly written code behaves poorly. I was hopingthe talk would actually be about parallelization. When parallelStream() saves time and when it doesn’t for example. What I learned was for this particular scenario, parallelization wasn’t helpful. And then right at the end, the generic rules. Which felt rushed and thown at us.