JavaOne – Streams in JDK 8

“Streams in Java 8 – The good, the bad & the ugly”

Speaker: Simon Ritter & Stuart Marks
[Simon has the Twtter handle @speakjava; very cool]

For more blog posts from JavaOne, see the table of contents


Need to think differently. We are used to imperative programming with loops and variables.

Dealing wih exceptions
ugly code – three lines of code and hard to tell what it does

Problems:

  • looks like Perl
  • returns null (vs Optional or empty string)
  • split is called twice so wasted work
  • skipped URLDecoder.decode() because didn’t want to deal with a checked exception – but lost functionality. Problem caused by a missing API in Java so have to use decode.

Better approach:

  • use a method with a try/catch block; call that method from the stream
  • use Map.entry to simulate a tuple
  • Use single char (vs regex) in split. If only pass one character to split, far faster
  • split() is overloaded to take a numeric limit to how many are returned

Imperative streams
inside the for each is a print, and if statement and a LongAdder variable (good for frequent writes and infrequent reads)

then refactored to use mapToInt, a println and an if statement and a local variable. more complicated and still not functional

then switched to peek and no variabe but still an if statement (well a ternary)

finally switched to use a filter and count instead of sum

still not 100% functional because println is a side effect. ok for debugging

[good showing evolution to get functional]

Problems

  • Easy to misuse forEach() because feels familiar. But easily leads to side effects
  • Imperative thinking “for each of these I want to..”
  • Pause to consider if should use for each

Mixing internal and external iteration
for loop running 12 times and then getting data for each month with filter checking Month.of(x) – doesn’t work because x isn’t effectively final

“solve” effectively final by setting to different interim variable

IntStream.range(0,12).forEach – uses internal iteration but forEach. Marginaly better as don’t need interim variable

Instead return a nested map of Month to Map with nested grouping by so only need one iteration – the data stream

Problems

  • Going through data.stream() 12 times
  • forEach cheat
  • array not right data structure; it’s really a map of month to value

Hands on lab question
reduce (“”, (a,b) -> a+b) – works but inefficient because String concatenation

reduce(a,b) -> sb.append(b) – fails because ignores the first letter.

next attempt uses an if statement in reduce

then tried a custom collector. works but more complicated than necessary
Collector.of(StringBuilder::new, StringBuilder::append, StringBuilder::append, StringBuilder::toString

or just use Collectors.joining()

Problems

  • If not using a parameter, it is probably wrong
  • Side effects
  • if stateent version not associative so would fail when run in parallel

Misc

  • can’t use same stream multiple times
  • method references are slightly more efficient than lambdas because lambda gets added into a method in bytecode. Saves a level of indirection by using method reference. But only slightly
  • Calling .sorted() multiple times vs chaining comparing.thenComparing – the later is better [also works because preserves sort :)]
  • parallel streams do more work. might or might not complete faster. uses fork-join pool. number of threads defaults to number of CPUs. In Java9, this is # CPUs for container. On Jaa 8, it was for physical machine
  • Nested parallel streams is bad idea because using same threads so performance is worse. Can create ForkJoinPool if must. Buyer beware; this is an implementation specific behavior and tied to the profile of the machine you write it for.

My take: Fun start to he morning. I like that they covered common things in an entertaining way and not common things. Something to learn for everyone!

refactoring legacy code for java 8

I asked a teammate at CodeRanch if I could use a method he wrote to show how it could be written more clearly/succinctly in Java 8. He said yes. This is the original method. It is 95 lines and has a cyclomatic complexity of 27.

public static Map<Category, List<Forum>> getBetaTopForums(BestTopicsDAO bestTopicDAO, int userId, List<Topic> bestThisYear, List<Topic> secondaryBestList){
        Map<Integer, Integer> ratings = bestTopicDAO.selectForumRatingsByUserId(userId);
        
        List<Integer> forumIds = new ArrayList<>();
        List<Integer> ratedForumIds = CollectionCommon.getKeysSortedByValues(ratings, true);
        final int defaultUserId = SystemGlobals.getIntValue(ConfigKeys.BEST_TOPICS_DEFAULT_USER_ID);
        
        // Let's first add the forums that are rated above 5/10
        for(Integer forumId: ratedForumIds){
            Integer r = ratings.get(forumId);
            if(r != null && r > 5){
                forumIds.add(forumId);
            }
        }
        
        // Now let's add the forums found in best topics.
        if(bestThisYear != null){
            for(Topic topic: bestThisYear){
                int forumId = topic.getForumId();
                if(!forumIds.contains(forumId)){
                    forumIds.add(forumId);
                }
            }
        }
        if(secondaryBestList != null){
            for(Topic topic: secondaryBestList){
                int forumId = topic.getForumId();
                if(!forumIds.contains(forumId)){
                    forumIds.add(forumId);
                }
            }
        }
        
        // If this user is not the default-rating-user, let's add the top forums
        // rated by the default-rating-user.
        if(userId != defaultUserId) {
            Collection<Integer> fids = getTopRatedForumIds(bestTopicDAO);
            for(Integer fid: fids) {
                if(fid != null && !forumIds.contains(fid)) {
                    forumIds.add(fid);
                }
            }
        }
        
        
        // Let's also add some forums found in hot topics.
        List<Topic> tmpTopics = TopicRepository.getHottestTopics();
        for(Topic topic: tmpTopics){
            int forumId = topic.getForumId();
            if(!forumIds.contains(forumId)){
                forumIds.add(forumId);
            }
        }
        
        // Finally, let's add the forums that were rated below 6/10
        for(Integer forumId: ratedForumIds){
            Integer r = ratings.get(forumId);
            if(r != null && r < 6 && !forumIds.contains(forumId)){
                forumIds.add(forumId);
            }
        }
        
        Map<Category, List<Forum>> forums = new LinkedHashMap<>();
        
        PermissionControl pc = SecurityRepository.get(userId);
        
        final int maxTopForums = SystemGlobals.getIntValue(ConfigKeys.BETA_VIEW_MAX_TOP_FORUMS_IN_HOME);
        int forumsToShow = 0;
        
        // We need to remove the forums that are not visible, or rated 0/10
        for(Integer forumId: forumIds){
            if(pc.canAccess(SecurityConstants.PERM_FORUM, String.valueOf(forumId))){
                Forum f = ForumRepository.getForum(forumId);
                Category c = f != null ? ForumRepository.getCategory(f.getCategoryId()) : null;
                if(f != null && c != null && f.isShowInForumList()){
                    
                    if(++forumsToShow > maxTopForums){
                        break;
                    }
                    
                    List<Forum> fs;
                    if(forums.containsKey(c)){
                        fs = forums.get(c);
                    }else{
                        fs = new ArrayList<>();
                        forums.put(c, fs);
                    }
                    fs.add(f);
                }
            }
        }
        return forums;
        
    }

First step – write unit tests

First, I wrote unit tests for the existing code to ensure I didn’t break anything. While doing that, I did some minor refactorings. (I didn’t do extract method to make the method shorter since I knew I’d be updating that later). What I did change:

  1. Removed BestTopicsDAO as a parameter. It comes from a factory and we have a mock framework setup for that factory already. So there was no reason it had to be a parameter
  2. Renamed bestThisYear to primaryBestList. (It doesn’t always represent the best for the year)

I couldn’t get 100% test coverage because a few of the null checks were unreachable due to early logic (in the helper methods.) I left them in to see the effect they’d have on conversion.

Refactoring to Java 8 – Getting started

There’s a lot of for loops here. And I suspect they are very similar. Let’s start with the first one:

for(Integer forumId: ratedForumIds){
  Integer r = ratings.get(forumId);
  if(r != null && r > 5){
    forumIds.add(forumId);
  }
}

I converted this to a helper method and a stream:

forumIds.addAll(ratedForumIds.stream().filter(id -> {
  Integer r = ratings.get(id);
  return r != null && r > 5;
}).collect(Collectors.toList()));

I’ll grant you that this code isn’t shorter than the original code. It is more similar in form to the code I expect to write next though.  Also, it’s not easier to read given I didn’t extract the two line lambda. I’m going to do that after I get rid of the similar for loops.

The next refactoring

The next one is interesting. It’s just a for loop and if statement which is easy to rewrite.

for(Topic topic: primaryBestList){
  int forumId = topic.getForumId();
  if(!forumIds.contains(forumId)){
    forumIds.add(forumId);
  }
}

which is equivalent to

forumIds.addAll(primaryBestList.stream()
  .map(Topic::getForumId)
  .filter(f -> ! forumIds.contains(f))
  .collect(Collectors.toList()));

However, it is interesting because it check if the forum is already in forumIds. This shouldn’t be necessary as it is something a data structure could do. We need a data structure that is both a list (preserves order) and a set (checks for uniqueness.) Enter LinkedHashSet. It’s a set and preserves the insertion order. The JavaDoc even specifies that if you add the same element again, the order doesn’t change. Perfect. Switching forumIds to a LinkedHashSet allows for getting rid of the filter intermediate operation. And the unit tests still work.  I used the same techniques for the next five for loops.

Time to use methods

I converted two methods that had to do with ratings and three that were straight conversions.  I could have one common method they all use with identities for the filter/map that don’t apply. Two methods would be clearer. Giving me:

private static void addForumIds(Collection<Integer> forumIds, List<Topic> candidates) {
  if (candidates != null) {
    List<Integer> toAdd = candidates.stream().map(Topic::getForumId).collect(Collectors.toList());
    forumIds.addAll(toAdd);
  }
}

private static void addForumIds(Collection<Integer> forumIds, Collection<Integer> candidates, Predicate<Integer> predicate) {
  if (candidates != null) {
    List<Integer> toAdd = candidates.stream().filter(predicate).collect(Collectors.toList());
    forumIds.addAll(toAdd);
  }
}

The last one

The last chunk of code needed more refactoring to take care of all the forum filtering logic first. It also needed a mutable object for the counter rather than a primitive so it could be updated inside a lambda.

Conclusion

I could refactor this more. There’s duplication in the two lambdas. And I could extract more code into separate methods. Interestingly, the total code base is about the same as before. (It’s actually three lines longer, but I have more than three lines of comments.) But the method with the main logic is shorter and far less complicated:

public static Map<Category, List<Forum>> getBetaTopForums(int userId, List<Topic> primaryBestList, List<Topic> secondaryBestList){
        BestTopicsDAO bestTopicDAO = DataAccessDriver.newBestTopicDAO();
        Map<Integer, Integer> ratings = bestTopicDAO.selectForumRatingsByUserId(userId);
        
        Collection<Integer> forumIds = new LinkedHashSet<>();
        List<Integer> ratedForumIds = CollectionCommon.getKeysSortedByValues(ratings, true);
        final int defaultUserId = SystemGlobals.getIntValue(ConfigKeys.BEST_TOPICS_DEFAULT_USER_ID);
        
        // Let's first add the forums that are rated above 5/10
         addForumIds(forumIds, ratedForumIds, id -> {
            Integer r = ratings.get(id);
            return r != null && r > 5;
        });
        
        // Now let's add the forums found in best topics.
        addForumIds(forumIds, primaryBestList);
        addForumIds(forumIds, secondaryBestList);
          
        // If this user is not the default-rating-user, let's add the top forums
        // rated by the default-rating-user.
        if(userId != defaultUserId) {
            Collection<Integer> fids = getTopRatedForumIds(bestTopicDAO);
            addForumIds(forumIds, fids, id -> id != null);
        }
        
        // Let's also add some forums found in hot topics.
        List<Topic> tmpTopics = TopicRepository.getHottestTopics();
        addForumIds(forumIds, tmpTopics);

        // Finally, let's add the forums that were rated below 6/10
       addForumIds(forumIds, ratedForumIds, id -> {
            Integer r = ratings.get(id);
            return r != null && r < 6;
        });
        
        Map<Category, List<Forum>> forums = new LinkedHashMap<>();
        
        PermissionControl pc = SecurityRepository.get(userId);
        
        final int maxTopForums = SystemGlobals.getIntValue(ConfigKeys.BETA_VIEW_MAX_TOP_FORUMS_IN_HOME);
        AtomicInteger countForumsToShow = new AtomicInteger(0);
        
        // We need to remove the forums that are not visible, or rated 0/10
        forumIds.stream()
             // skip if can't access forum
            .filter(id -> pc.canAccess(SecurityConstants.PERM_FORUM, String.valueOf(id)))
            // get forum
            .map(ForumRepository::getForum)
            // skip if forum not found or can't access
            .filter(f -> f != null && f.isShowInForumList())
            // get category
            .forEach(f -> addForumToCategory(forums, f, maxTopForums, countForumsToShow));
        
        return forums;
        
    }
    
    private static void addForumToCategory(Map<Category, List<Forum>> forums, Forum f, int maxTopForums,
            AtomicInteger countForumsToShow) {
        Category c = ForumRepository.getCategory(f.getCategoryId());
        if (c != null && countForumsToShow.incrementAndGet() <= maxTopForums) {
            List<Forum> fs;
            if (forums.containsKey(c)) {
                fs = forums.get(c);
            } else {
                fs = new ArrayList<>();
                forums.put(c, fs);
            }
            fs.add(f);
        }
    }

    private static void addForumIds(Collection<Integer> forumIds, List<Topic> candidates) {
        if (candidates != null) {
            List<Integer> toAdd = candidates.stream().map(Topic::getForumId).collect(Collectors.toList());
            forumIds.addAll(toAdd);
        }
    }
    
    private static void addForumIds(Collection<Integer> forumIds, Collection<Integer> candidates, Predicate<Integer> predicate) {
        if (candidates != null) {
            List<Integer> toAdd = candidates.stream().filter(predicate).collect(Collectors.toList());
            forumIds.addAll(toAdd);
        }    
    }

using Nashorn’s jjs for experimenting

In Java 8, Oracle introduced the new Nashorn engine and is encouraging the use of jjs. It’s a RPEL (read print evaluate loop) for JavaScript. Since JavaScript can call Java, this allows experimenting with Java APIs like you do in other languages like Python and Ruby. It works better in those languages though.

Cay Horstmann’s book, Java SE 8 for the Really Impatient, had good exercises for lambdas so I decided to the end of chapter exercises for Nashorn.

Exercise 1 – Play with jjs to experiment with an API you want to explore; did you find it easier than writing test programs in Java?

Nope. The so called advantage of using jjs is that you don’t need to write the plumbing code. But an IDE can generate a class and main method in a couple clicks.

  1. Strike 1 – With jjs, I need to type in the package of name of classes I want to use rather than just letting Eclipse deal with imports for me.
  2. Strike 2 – jjs doesn’t have tab complete (autocomplete) which means I have to type in the full class/method name of the class I am experimenting with. If I was an expert, I wouldn’t be in jjs in the first place.
  3. Strike 3 – jjs doesn’t support the up arrow. Which means when I make typos (see strike 1 and 2), I get to type the whole thing again.

Exercise 2 – Using JJS and the stream library iteratively work out a solution to the problem to print a filtered list of unique long words in sorted order and see how it compares to your usual workflow.

Here was my workflow with jjs:

  1. Create the test file in a text editor
  2. Run ./jjs
  3. Create a path object
    Attempt # Code Error Comments
    1 jjs> var paths = Paths.get(‘path/words.txt’) <shell>:1 ReferenceError: “Paths” is not defined Forgot to type the package name. Don’t know it by heart anyway. Need to look it up in the JavaDoc.
    2 jjs> var paths = java.nio.file.Paths.get(‘path/words.txt’) Success
  4. Read the file
    Attempt # Code Error Comments
    1 jjs> List<String> list = java.nio.files. Files.readAllLines(path) ECMAScript Exception: ReferenceError: <shell>:1:11 Invalid left hand side for assignment It’s JavaScript, not Java so no type declaration. Should just use var.
    2 jjs> var list = java.nio.files. Files.readAllLines(path)  <shell>:1 ReferenceError: “path” is not defined Typo in variable name. I wish I named it path, not paths. But I want to do the exercise in order and not go back.
    3 jjs> var list = java.nio.files. Files.readAllLines(paths) java.lang.RuntimeException: java.lang. ClassNotFoundException: java.nio.files. Files.readAllLines Typo in package name. It should be file and not files
    4 jjs> var list = java.nio.file. Files.readAllLines(paths) java.lang.RuntimeException: java.nio.file. NoSuchFileException: path/words.txt Missing forward slash in path name. On the bright side, I can rename the variable without feeling like I’m messing up the experiment.
    5 jjs> var path = java.nio.file. Paths.get(‘/path/words.txt’) Worked as expected (but at this point, I’m typing into this blog post and copy/pasting into the jjs console
    6 jjs> var list = java.nio.file. Files.readAllLines(paths)  java.lang.RuntimeException: java.nio.file. NoSuchFileException: path/words.txt Sigh. Didn’t use the corrected variable name
    7 var list = java.nio.file. Files.readAllLines(path) Worked as expected
  5. Filter the long words
    Worked.

    Attempt # Code Error Comments
    1 jjs> list.stream().grep({ l -> l.length() > 12}) ECMAScript Exception: SyntaxError: <shell>:1:23 Expected : but found – This is completely wrong. Aside from the stream() call, it is using Groovy syntax. (I code Groovy at a command line so my fingers started there)
    2 jjs> list.stream().filter( l -> l.length() > 12) ECMAScript Exception: SyntaxError: <shell>:1:30 Expected an operand but found Hmm. Puzzled. it looks right this time. I copy/pasted into my IDE and it compiled. I then started searching online and found out that you have to use a function literal. (This was in the book. I forgot about it)
    3 jjs> list.stream().filter(function(e) { return e.length() > 12 }) Prints reference to stream (as does Java.) Now can add the print
    4 jjs> list.stream().filter(function(e) { return e.length() > 12 }).forEach(System.out::println) ECMAScript Exception: SyntaxError: <shell>:1:79 Expected , but found : Not surprised that :: doesn’t work since -> doesn’t. Now for the long way.
    5 jjs> list.stream().filter(function(e) { return e.length() > 12 }).forEach(function(e) { print(e) }) Works as expected
  6. Add sorting, uniqueness and refactor
    Attempt # Code Error Comments
    1 jjs> list.stream().filter(function(e) { return e.length() > 12 }).sorted().forEach(function(e) { print(e) }) Sorting worked
    2 jjs> list.stream().filter(function(e) { return e.length() > 12 }).sorted().distinct().forEach(function(e) { print(e) }) Uniqueness filter worked
    3 jjs> list.stream().filter(function(e) e.length() > 12 ).sorted().distinct().forEach(function(e)  print(e) ) Remove unneeded code

 My thoughts

That was so far away from being easier that it was ridiculous.  I’m comfortable with the NIO and lambda APIs. This shouldn’t have been hard. The Java version is:

import java.io.*;
import java.nio.file.*;
import java.util.*;

public class Exercise2 {
   public static void main(String[] args) throws IOException {
      Path path = Paths.get("/path/words.txt");
      List<String> list = java.nio.file.Files.readAllLines(path);
      list.stream()
                .filter(l -> l.length() > 12)
                .sorted()
                 .distinct()
		.forEach(System.out::println);
     }
}