by José Paumard
New, elegant patterns that provide ways of handling corner cases in data processing pipelines
Many things have been introduced in Java SE 8 that bring deep changes to the way we write applications and APIs. This is the case with lambda expressions, of course, and also for the Stream and Collectors API. Lambda expressions induced changes to the way interfaces work in Java, bringing default and static methods. With these new tools, the Collections Framework has seen major additions in nearly all its interfaces.
Another element has been introduced: the final class Optional
. This element changes the way we can write data processing pipelines built on streams for better and more-fluent code. The goal of this article is to give details about the concept of an "optional," and to show the new and elegant patterns optionals bring to our toolbox. Optionals can work very efficiently with streams, as we are going to see.
What Is an Optional?
The concept of an optional is not new and has been already implemented in several other languages. This concept proves very helpful when modeling the fact that a method call can return an unknown value or a value that does not exist.
Most of the time, methods return a default value. This is the design choice made most often. For instance, a call to map.get(key)
returns null
when the key is not present in the map. This is how the maps in the Collections Framework work, even if this behavior sounds doubtful. In fact, this method cannot be used to tell whether a given key is present in a map or not. Indeed, if this key has been associated with a null value (this is certainly not something you should do!), the call to map.get(key)
will also return null.
The right way to determine whether a key is present in a map is to call the following code:
map.containsKey(key);
The Optional
class could have been returned by the map.get()
method, even if it might have been cumbersome to constantly check the returned object for its content at each call. That being said, it would be a mistake to think that the handling of corner cases is the only role of Optional
. In fact, we have been handling corner cases without optionals in Java for 20 years.
The way this class has been designed brings new and very elegant patterns that, along with the patterns brought by the Stream API, provide new ways of handling corner cases in data processing pipelines.
How Can We Build Optionals?
From a technical point of view, Optional
is a final class with a private constructor. In order to build an optional, we need to use one of the two factory methods provided.
The first one is Optional.of()
. As an argument, it takes an object that should not be null. If we pass a null object to this method, we will get a NullPointerException
. This method can be used to wrap any non‐null object in an optional.
The second factory method is Optional.ofNullable()
. This method also takes an object as an argument, the difference being that this object can be null. But do not expect to get a null wrapped in an optional: if you pass a null object to this method, you will get an empty optional.
So there is no way to create an optional wrapping a null object. An optional is a wrapper that holds a non‐null object or that is empty.
Why Do We Need Optionals?
This concept of "a result that does not exist" is used in many cases. Let's take a very simple example based on the Stream API. Let's compute a max using a stream:
IntStream stream = Stream.range(‐100, 100);
OptionalInt opt = stream.max(Comparator.naturalOrder());
This max()
method returns a result wrapped in an OptionalInt
. Let's explore what would have been wrong if it had been an int.
A problem arises if the stream on which we compute this max turns out to be empty. Remember that a real‐life stream can be the result of a complex computation, involving mappings, filterings, selections, and the like. It can also be the result of the splitting of a main stream into substreams distributed among the several cores of a CPU for parallel computation. So, an empty stream is something that can definitely happen, even when we do not expect it.
Let's write the code that is going to be executed in the case of splitting. Let's suppose that the max()
method returns an int
instead of an optional.
// This part of the code is run on CPU‐1
IntStream stream1 = Stream.range(‐100, 0);
int max1 = stream1.max(Comparator.naturalOrder()); // max1 = ‐1
// And this part on CPU‐2
IntStream stream2 = Stream.range(0, 100);
int max2 = stream2.max(Comparator.naturalOrder()); // max2 = 99
int max = Integer.max(max1, max2);
Everything runs fine here, because our streams are not empty.
Let's now have a look at a faulty case, where one of the streams turns out to be empty.
IntStream stream1 = Stream.range(‐100, 0);
int max1 = stream1.max(Comparator.naturalOrder()); // max1 = ‐1
IntStream stream2 = Stream.empty();
int max2 = stream2.max(Comparator.naturalOrder()); // Suppose max2 = 0
int max = Integer.max(max1, max2); // result is 0
We can see that choosing int
as the return type of the max()
method leads to a faulty result for empty streams. The reason is the following: the returned value of an operation conducted on an empty stream should be the identity element of that operation. Since the max()
operation does not have an identity element, this returned value does not exist. Choosing to return 0 leads to faulty results.
You might ask: what about returning Integer.MIN_VALUE
? This is indeed a possibility, but we need to be absolutely sure that this int
will not be converted into a long
. This trick has been used in the IntSummaryStatistics
class from the JDK. But it can certainly not be used in general cases.
Optional
to the Rescue
The max of an empty set is not defined, and choosing a general default value is dangerous. It could lead to corrupt results in our data processing pipelines. The max is not the only operation that has no identity element; this is also the case for the min or the average.
The optional type has been introduced to handle those cases properly. It models the fact that a value might not exist. A value that might not exist is different from a value that is null, or equals 0, or equals whatever other default value you can think of.
Optionals: First Patterns
There are, in fact, two types of patterns exposed by the Optional
class. The first one sees an optional object as a wrapping object, just like Integer
, Long
, and the like. The difference is that there might be no value in this wrapper.
There are a couple of methods to handle an optional as a wrapper: isPresent()
and get()
.
Optional<Person> opt = ...;
if (opt.isPresent()) {
int value = opt.get(); // there is a value
} else {
// decide what to do
}
This first pattern is straightforward and easy to understand. If we see that there is no value in the wrapper, we can decide the default value or follow another pattern than the main one. It is when the get()
method is called with no value in the optional that a NoSuchElementException
is thrown.
Then we have variant for this pattern. We can use the orElse()
with a default value.
Optional<Person> opt = ...;
Person result = opt.orElse(Person.DEFAULT_PERSON);
In this case, if the optional is empty, the default person will be returned.
This pattern is nice as long as the object Person.DEFAULT_PERSON
has already been built, or it is not too expensive to build. If it has not been built, or if performance is critical in our application, then we probably do not want to use this pattern.
We can use this second variant:
Optional<Person> opt = ...;
Person result = opt.orElseGet(() ‐> Person.DEFAULT_PERSON);
In this last case, instead of passing a fully built object, we pass a supplier that will be called if this object needs to be built. A supplier is a functional interface that models a function that does not take any argument and returns a value.
If we want to throw an exception, we can use this fourth and last variant:
Optional<Person> opt = ...;
Person result = opt.orElseThrow(() ‐> new MyCustomException());
In this last case, an exception will be built if needed, and thrown by the orElseThrow()
method.
This first family of patterns is quite classical: we check whether there is something in our optional. If we discover that this optional is empty, then we decide either to return a default value or to throw an exception. As a bonus, thanks to the introduction of lambdas, we can provide a constructor for this default value or this exception, in the form of a supplier.
Optionals: Second Patterns
But we can do much better than that!
By checking the methods exposed by the Optional
class, we can see that there is a family of methods that is the same as what we have in the Stream
interface—map()
, filter()
, and flatMap()
—and ifPresent()
, which looks like the forEach()
method. Because there cannot be more than one element in an optional, it would not make much sense to call this method forEach()
.
Let's see how we can leverage these methods. And for that, let's use an example from the Java SE 8 for the Really Impatient book by Cay Horstmann (Addison-Wesley, 2014).
We all know that it is not possible to compute the inverse of 0 or the square root of a negative number. What was done to handle those undefined math expressions was to introduce a special number called NaN (not a number).
What about using optionals instead of this trick? Let's build an OptionalMath
class that will return optionals for its operations. If the given operation can be computed on the passed parameter, the result will be put in the optional. If not, the returned optional will be empty. This OptionalMath
class is as follows:
public class OptionalMath {
public static Optional<Double> sqrt(Double d) {
return d >= 0 ? Optional.of(Math.sqrt(d)):
Optional.empty();
}
public static Optional<Double> inv(Double d) {
return d != 0 ? Optional.of(1/d):
Optional.empty();
}
}
The idea here is really simple. We always return the same object type, with no exception being thrown.
Suppose we have a stream of doubles to process.
Calculating the Inverse of a Square Root, First Version
A first (and not so good) pattern we may write is the following:
doubles[] doubles = ...; // the doubles we need to process
List<Double> result = new ArraysList<>(); // the result of our processing
Arrays.stream(doubles).boxed()
.forEach(
d ‐> OptionalMath.sqrt(d)
.flatMap(OptionalMath::inv)
.ifPresent(result::add)
);
First, we can notice that, thanks to the Optional.flatMap()
method, we can chain the calls to the inverse and square root operations.
Second, the ifPresent()
method allows us to chain the adding of the result to the result list, regardless of whether there is a result. The final pattern is very clean and very slick, with no cluttered if‐then‐else and no exception handling. The values that cannot be processed are just naturally and silently removed from the stream.
The only problem here is that our lambda expression is mutating an external list: result
. This is not so bad and will work as we expect. Still, it is a performance hit: accessing the enclosing context should be avoided for a lambda.
There is another hidden performance hit. Because we are mutating an external ArrayList
, going parallel for this computation is not possible. We are missing a nice optimization opportunity here.
Calculating the Inverse of a Square Root, Second Version
Would it be possible to write real stream processing code using this OptionalMath
class using a pattern that would not mutate an external list, which is something that should be avoided? In fact, the answer is yes, but we need to reconsider the way we process our data.
The natural way of processing this stream is to write it in this way:
Stream<Optional<Double>> intermediateResult =
Arrays.stream(doubles).boxed()
.map(d ‐> OptionalMath.inv(d).flatMap(OptionalMath::sqrt);
The problem is that a list of optionals is not a great idea; we would prefer to collect a list of the results directly. We need a way to map this stream of optionals to a stream of values. If an optional from that stream is empty, then it will, silently, not put any value in the stream.
So we need a function that takes a double and returns a stream instead of an optional. That stream would be empty if this optional is empty, and it would hold the value wrapped in this optional if there is one.
This can be done with the following function:
Function<Double, Stream<Double>> invSqrt =
d ‐> OptionalMath.inv(d).flatMap(OptionalMath::sqrt)
.map(result ‐> Stream.of(result))
.orElseGet(() ‐> Stream.empty());
This function can be written in a much more elegant way using method reference:
Function<Double, Stream<Double>> invSqrt =
d ‐> OptionalMath.inv(d).flatMap(OptionalMath::sqrt)
.map(Stream::of)
.orElseGet(Stream::empty);
How does this function work? Well, first it does the computation we need: it calculates the inverse of the square root and wraps the result in an optional.
Then, if there is a result (the map
method is from the Optional
class), it maps the result in a stream. This call to map()
returns an Optional<Stream<Double>>
. If the previous step returned an empty optional, this map does not do anything and returns an empty optional.
The last step opens this optional. If there is a value in it, wrapped in a stream, the orElse()
call just returns that stream, wrapping the result value. If there is no value, it returns the provided empty stream. So an empty optional is converted to an empty stream in this step.
This function does exactly what we need: it takes a double and returns a stream with the inverse of the square root of this double. If this operation cannot be done because this double is negative or null, the function returns an empty stream.
So we can now rewrite our data processing pipeline in a truly "streamish" way!
doubles[] doubles = ...; // the doubles we need to process
List<Double> result = Arrays.stream(doubles).boxed()
.flatMap(invSqrt)
.collect(Collectors.toList());
There is no more mutation of an external list; this stream can be computed in parallel with nice performance gains!
Conclusion
The second way of using optionals is much more interesting than the first one. An optional can be seen as a stream, either empty or a singleton. This leads to very natural and fluent patterns that can be computed in parallel. All the faulty values are naturally removed from the stream, with no if‐then‐else, no exception to handle, and no NaN in the case of the processing of doubles. This pattern relies on a special function that might look tedious to write: one to convert an optional to a stream. We will have a new function in Java SE 9, Optional.stream()
that will do this conversion directly. Our invSqrt
function will, thus, be written this way in Java SE 9:
Function<Double, Stream<Double>> invSqrt =
d ‐> OptionalMath.inv(d).flatMap(OptionalMath::sqrt).stream();
See Also
About the Author
José is an assistant professor at the Institut Galilée (Université Paris 13), and has a PhD in applied mathematics from the ENS de Cachan. He has been teaching about Java technologies at the university since 1998. José has also worked as an independent consultant for twenty years and is a well-known expert Java/Java EE/software craftsman and trainer. He gives talks at conferences, including JavaOne and Devoxx. He also writes technical articles for various media including Java Magazine and Oracle Technology Network. Passionate about education, he publishes massive open online courses (MOOC) for several companies, for example, for Oracle Virtual Technology Summit, PluralSight, Microsoft Virtual Academy, and Voxxed. He also is a member of the Java community in Paris, has been one of the lead members of the Paris JUG for six years, and is cofounder of Devoxx France. Follow him @JosePaumard.
Join the Conversation
Join the Java community conversation on Facebook, Twitter, and the Oracle Java Blog!