Skip navigation

kocko

2 posts

As you may already know, Java's Stream API is one of the most significant features introduced in the latest platform version. Aside from the lambdas, which can reduce the amount of boilerplate code in our code base, the Stream API releases us from the burden to be responsible for how collections are ]]>Iterator, or use loops (for example for, enhanced for and so on), but now, working with Java8, we can just use a collector and it's nested features to manipulate the content of the collection. This way, we're passing the responsibility for iterating and manipulating to the Stream, by just passing instructions (in the form of lambdas) to how these elements have to be curried. Retrieving back collection items from a Stream is often done with by calling the eager overloaded methods Stream#collect(Collector collector) and Stream#collect(Supplier supplier, BiConsumer accumulator, BiConsumer combiner). For the some commonly used operations, the Stream API provides implementations of several collectors, which we access by invoking some of the static methods of the Collectors class, such as Collectors#toSet(), Collectors#toList() or Collector#toMap(). Sometimes, however, we may get to the point where the pre-defined Collector implementations may not be suitable, or at least, may not transform the elements from the stream to the exact collection type we would desire. This is something that is very likely to happen. For example, Collectors.toList() may produce an ArrayList, while we need a LinkedList. In such situations, knowing and understanding how to write a custom Collector implementation is crucial. Let's have the following use-case. We have this data-structure (in specific, a TreeMap): Map<String, List<String>> peopleByCity = new TreeMap<>(); with the following content: { "London" : [ "Steve", "John"], "Paris" : [ "Thierry" ], "Sofia" : [ "Peter", "Konstantin", "Ivan"] } We'd like to implement a Collector, which transforms the entry set of the given TreeMap to a List<Map.Entry> elements. For the example above, the transformation has to result into a List with the following content: London : Steve London : John Paris : Thierry Sofia : Peter Sofia : Konstantin Sofia : Ivan Implementing a custom Collector is easy! We just need to implement the java.util.stream.Collectorinterface. By definition, it's generic and has three type parameters.

  • T - the type of input elements to the reduction operation
  • A - the mutable accumulation type of the reduction operation (often hidden as an implementation detail)
  • R - the result type of the reduction operation

The Collector interface introduces five abstract methods, and I will explain the ideas behind all of them. Typically, the Collector is a type of reducer, which often needs a temporary (internal) mutable structure, which holds the temporary state of the transformed items. It's often referred with the term accumulator. Here, an ArrayList would be perfectly suitable for an accumulator, because for each pair of the type [City ; Name] we will add a new entry to the accumulator. Proceeding to the actual implementation, the class definition would be: public class KockoCollector<T, V> implements Collector<Entry<T, List<V>>, List<Entry<T, V>>, List<Entry<T,V>>> { //Implemented methods } The supplier() method returns a function, which supplies with the accumulator (the mutable result container) for the Collector. Since we picked the ArrayListas a type of our accumulator, the method implementation would be simply:

 @Override public Supplier<List<Entry<T, V>>> supplier() { return ArrayList::new; } 

The accumulator() method returns a function, which folds the element from the stream that is currently being processed into the accumulator. In our case, we just stream the person names for every next city and for each person name we add a new AbstractMap.SimpleEntry to the accumulator. (Note that this is perfectly valid, because AbstractMap.SimpleEntry is the super-class for Map.Entry

 @Override public BiConsumer<List<Entry<T, V>>, Entry<T, List<V>>> accumulator() { return (accum, entry) -> { entry.getValue() .stream() .forEach(x -> accum.add(new AbstractMap.SimpleEntry<T, V>(entry.getKey(), x))); }; } 

The third method we have to implement is the combiner(). This method is used strongly when working with parallel streams. It's purpose is to combine the internal temporary collector-accumulators of the stream batches that are being processes in parallel. The implementation of this method can be left empty, if the Collector is not supposed to work with parallel streams. Otherwise, it's mandatory to describe how the parallel pieces will be merged together. We'd like our custom Collector to work on parallel streams, so we provide implementation for combiner(). A single call to this method will return a BinaryOperator, which implementation merges together the content of two accumulators, x and y, by simply adding the content of the one to the content of the other.

 @Override public BinaryOperator<List<Entry<T, V>>> combiner() { return (x, y) -> { x.addAll(y); return x; }; } 

We're almost ready implementing our custom collector. We picked the type of our accumulator, we explained how our collector will merge the content of parallel collector-accumulators, etc. The only that's left, is to pick our final return type. This is what the finisher() method is used for. It returns a Function which takes the collectors internal accumulator and converts it to the type that our collector is supposed to produce when finishing work with the stream elements. Sometimes, however, returning the accumulator from the finisher() is perfectly valid and this happens in the cases when we actually don't need to convert the accumulator to some other type. Our case is one of these and therefore the finisher()implementation is pretty simple:

 @Override public Function<List<Entry<T, V>>, List<Entry<T, V>>> finisher() { return accumulator -> accumulator; } 

The Collector interface introduces one more abstract method. That's the characteristics() one, which returns a Set<Collector.Characteristics>, containing meta-information about the collector. The Collector.Characteristics enum has only three values: CONCURRENT, IDENTITY_FINISH, UNORDERED. We're always required to provide at least one of these in the resulting Set. If our Collector was thread-safe (it isn't), we'd have added the CONCURRENTconstant to the Set. It's just unordered, because it doesn't guarantee that the collector will preserve the encounter order of the stream.

 @Override public Set<java.util.stream.Collector.Characteristics> characteristics() { return EnumSet.of(Characteristics.UNORDERED); } 

That's pretty much it. As you've seen, implementing custom collectors is not too difficult and it's actually quite fun. Think about what kind of collectors can you implement for the project you're currently working on. Can you share your ideas? Please leave a comment below if you find this article useful or if you have questions, as well. Cheers, Konstantin

[img]https://www.java.net/sites/default/files/imagefield_thumbs/kocko/java-logo-lambda.png[/img] The lambdas are without any doubt one of the most intriguing and attractive features in Java8, but sometimes instead of helping us writting a better and boilerplateless code, they can get us into trouble. And still, they are the better alternative to the anonymous classes for lots of reasons. The anonymous classes were a nice way to achieve clojures in Java, but it was something natural to write a lot of boilerplate code to achive something atomic. Let take a look on the following code snippet:myButton.addActionListener(new ActionListener() { @Override public void actionPerformed(ActionEvent e) { System.out.println(