|
|||||||||||||||||||
HOME | COURSES | TALKS | ARTICLES | GENERICS | LAMBDAS | IOSTREAMS | ABOUT | CONTACT | | | | |||||||||||||||||||
|
Lambda Expressions
|
||||||||||||||||||
This is a webpage devoted to lambda expressions in Java (a new language
feature) and streams (new abstractions in the JDK's collection framework) - both were added to Java on March 18, 2014 with release 8.
If you want to provide feedback or have any questions regarding lambda expressions in Java feel free to send me
EMAIL
or use the
LAMBDA QUESTION
form.
The Lambda Tutorial is available in PDF format (330KB) and in EPUB format (410KB) . The Lambda Reference (note: it is an incomplete draft version) is available in PDF format (1200KB) .
|
|||||||||||||||||||
Lambda ExpressionsBackground and a Bit of Trivia
Many popular programming languages nowadays
have a language feature known as "lambdas" aka "closures". Such languages
do no only include classic functional languages such as Lisp or Scheme,
but also younger ones such as JavaScript, Python, Ruby, Groovy, Scala,
C#, and even C++ has lambdas.
[1]
Some of these lambda-supporting languages run on the Java Virtual Machine,
and, naturally, Java as the most prominent programming language on the
JVM did not want to be left behind. A discussion regarding the addition
of lambdas/closures began in 2006, shortly after Java 5 had been released.
Three perfectly reasonable proposals for closures emerged, but unfortunately
they did not converge. Instead there was a heated debate regarding the
pros and cons of the various diverging ideas
[2]
.
In 2009 the effort grounded to a halt and it ultimately looked like Java
would never ever be extended by a lambda or closure feature. Consequently,
Java's immanent decline as a vibrant programming language was predicted
and the slogan of "Java is the new Cobol" was born.
Then, in 2010, Oracle declared that Java's
death is not an option and that - quite the converse - Java is supposed
to stay around for many years to come as a popular language in wide-spread
use
[3]
.
For that to happen, Java is required to adapt to the needs of modern hardware
and in particular modern and future multi-core platforms. On a platform
with multiple CPU cores many operations can and should be executed in parallel.
While concurrent programs keep a couple of dozens of CPU cores busy with
the fairly coarse-grained parallelization that is common practice today,
it is obvious that it would take a more fine-grained approach to keep hundreds
of CPU cores busy. In order to support fine-grained parallelization,
it was decided that the JDK collections framework must undergo a major
overhaul. Collections need bulk operations that apply functionality to
all elements in a sequence and do so with several threads in parallel.
Implementation of parallel bulk operations for collections requires a better
separation of concerns, namely separating "what" is applied to all sequence
elements from "how" it is applied to the sequence elements. And this
is what lambdas are for: they provide a convenient and concise notation
for functionality, which can be passed as an argument to a bulk operation
of a collection, which in turn applies this functionality in parallel to
all its elements.
lambda
expression
What are Lambda Expressions?
The term
lambda
(short for:
lambda
expression
) stems from
Lambda calculus
, which is a theoretical
framework for describing functions and their evaluation. Rather than
delving into the theoretical background let us keep it simple: a lambda
expression denotes a piece of functionality. Here is an example of a
lambda expression in Java:
(int x) -> { return x+1; }
Merely looking at it tells that a lambda expression is something like a method without a name. It has everything that a method has: an argument list, which is the (int x) part of the sample lambda, and a body, which is the { return x+1; } part after the arrow symbol. Compared to a method a lambda definition lacks a return type, a throws clause, and a name. Return type and exceptions are inferred by the compiler from the lambda body; in our example the return type is int and the throws clause is empty. The only thing that is really missing is the name. In this sense, a lambda is a kind of anonymous method. ad-hoc functionality Lambda Expressions vs. Anonymous Inner Classes
Lambdas are implemented "ad hoc". That
is, where and when they are needed. To this regard they are similar to
a Java language feature that we have been using all along, namely anonymous
inner classes. Anonymous inner classes, too, are "ad hoc" implementations
- not just of a single method, but of an entire class with one or several
methods. Both lambdas and anonymous inner classes are typically defined
for the purpose of passing them as an argument to a method, which takes
the anonymous functionality and applies it to something.
Here is a method from the JDK that takes
a piece of functionality and applies it; it is the
listFiles(FileFilter)
method from class
java.io.File
.
public
File[] listFiles(FileFilter filter) {
String ss[] = list();
if (ss == null) return null;
ArrayList<File> files = new ArrayList<>();
for (String s : ss) {
File f = new File(s, this); if ((filter == null) || filter.accept(f)) files.add(f); } return files.toArray(new File[files.size()]); }
The listFiles(FileFilter) method takes a piece of functionality as an argument. The required functionality must be an implementation of the FileFilter interface: public interface FileFilter { boolean accept(File pathname);
}
A file filter is a piece of functionality that takes a File object and returns a boolean value. The listFiles method applies the filter that it receives as an argument to all File objects in the directory and returns an array containing all those File objects for which the filter returned true . For invocation of the listFiles method we must supply the file filter functionality, i.e. an implementation of the FileFilter interface. Traditionally, anonymous inner classes have been used to provide ad hoc implementations of interfaces such as FileFilter . Using an anonymous inner class the invocation of the listFiles method looks like this: File myDir = new File("\\user\\admin\\deploy"); if (myDir.isDirectory()) { File[] files = myDir.listFiles( new FileFilter() { public boolean accept(File f) { return f.isFile(); } } );
}
Using a lambda expression it looks like this: File myDir = new File("\\user\\admin\\deploy"); if (myDir.isDirectory()) { File[] files = myDir.listFiles( (File f) -> { return f.isFile(); } );
}
The example illustrates a typical use of lambdas. We need to define functionality on the fly for the purpose of passing it to a method. Traditionally, we used anonymous inner classes in such situations, which means: a class is defined, an instance of the class is created and passed to the method. Since Java 8, we can use lambda expressions instead of anonymous inner classes. In both cases we pass functionality to a method like we usually pass objects to methods: This concept is known as "code-as-data", i.e. we use code like we usually use data. Using anonymous inner classes, we do in fact pass an object to the method. Using lambda expressions object creation is no longer required [4] ; we just pass the lambda to the method. That is, with lambda expression we really use "code-as-data". In addition to getting rid of instance creation, lambda expressions are more convenient. Compare the alternatives: Using an anonymous inner class it looks like this: File[] files = myDir.listFiles( new FileFilter() { public boolean accept(File f) { return f.isFile(); } }
);
Using a lambda expression it looks like this: File[] files = myDir.listFiles( (File f) -> { return f.isFile(); }
);
We will later [5] see, that in addition to lambda expressions there is a feature called method references . They can be used similar to lambdas. With a method reference the example looks even simpler. Using a method reference it looks like this:
File[] files = myDir.listFiles(
File::isFile
);
Clearly, use of a lambda expression or a method reference reduces the file filter to its essentials and much of the syntactic overhead that definition of an anonymous inner class requires is no longer needed. method & function difference Methods vs. Functions
Lambda expressions are a key concept in
so-called
functional programming languages
. Java in contrast is
an object-oriented programming language. Adding an element from a functional
language to Java allows for programming techniques that were previously
uncommon due to lack of support. In order to understand what lambdas
are and how they can be used we want to look into some of the principles
of functional programming. An important issue is the difference between
a
method
and a
function
.
Note, that in the following, we use the
terms "method" and "function" to denote concepts and to point out the difference
between the two concepts. In practice, the terms "method", "function",
and "procedure" are often used interchangeably to describe similar principles.
Also, in practice, the difference between the concepts may be not be very
pronounced. Since we want to explore how lambdas differ from other concepts
in Java that we are familiar with, we will focus on the differences between
the concepts of a method and a function.
In order to grasp the difference between
methods and functions, we take a brief dip into the history and principles
of programming languages.
Both methods and functions represent a
piece of reusable functionality. They both take arguments, have some
kind of body, which is executable code and represents the actual functionality,
and they may produce a result and/or side effects. In Java, a
method
always belongs to a class; Java has no concept of
free functions
outside of a class. A method has read and write access to fields of the
class to which it belongs and may modifies these fields. A method can
also modify the arguments it receives and produce other side effects, but
most of the time methods modify fields or produce a result. A
function
,
in contrast, operates solely on the arguments it receives. It typically
does not modify any data; often it does not even have access to any mutable
data.
Methods and functions are used in different
ways. The idea of methods (or procedures) stems from
procedural
(aka
imperative
) languages such as Fortran, Pascal, and C. In
those languages procedures are invoked in a specific order and may operate
on data which they read and modify. The term "procedural" stems from
the use of "procedures"; the notion of an "imperative" language stems from
the fact that the programmer dictates how things are done, i.e., in which
order which procedures are applied to which data. In an imperative language
the order of applying procedures matters. This is obvious when we consider
that procedures may modify data. After all, it makes a substantial difference
in which order modifications happen and whether data is read before or
after a modification.
Object-oriented
languages such as
C++, Smalltalk, Eiffel, and Java extend the procedural approach. They
bundle data and procedures into objects with state (the fields) and behaviour
(the methods). The principle of using methods is still the same as in
procedural languages: methods are invoked and operate on data (the fields
of the class to which the method belongs), which they read and may alter.
For this reason, object-oriented languages are imperative languages, too.
pure
function
The idea of functions stems from
functional
languages such as Lisp, Haskell, Closure, and Scala. These languages
have pure functions that behave as described above: they do not mutate
data, instead they produce results from the arguments they receive. In
addition, functions are used differently. A function, too, is invoked,
but the key idea is that functions are passed around as arguments to an
operation or as the result of an operation. This is similar to passing
around data or objects in procedural and object-oriented languages - hence
the previously mentioned notion of "code-as-data".
Functional languages are declarative as opposed to imperative . They describe what a program should accomplish, rather than describing how to go about accomplishing it. In a declarative language the programmer does not dictate which steps must be performed in which order. Instead, there is a clear separation between what is done (this is what the programmer declares) and how it is done (this is determined elsewhere, by the language or the implementation). In pure functional languages such as Haskell the functions do not have any side effects. In particular, they do not modify data or objects. If no side effects occur, order does not matter. Consequently it is easy to achieve a clear separation between "how" and "what" is done.
Hence there are the following differences
between the concept of methods and pure functions :
The bottom line is: • We can express methods in Java. These are the methods of Java classes that we are familiar with. They are executed, they may mutate data, the order of their invocation matters, and they describe how things are done.
•
We
can express
pure
functions
in Java. Traditionally, functions
were expressed as anonymous inner classes and now can be expressed as lambda
expressions. Lambdas and anonymous classes are passed around like data,
typically as arguments to a method. If they do not modify data they are
pure
functions.
In this case their invocation order does not matter and they describe what
is done rather than how it is done.
Since Java is a hybrid language (object-oriented with functional elements), the concepts are blurred. For instance, a method in Java need not be a mutator. If it does not modify anything, but purely inspects data, then it has properties of a function. Conversely, a function in Java (i.e., an anonymous inner class or lambda expression) need not be pure. It may modify data or produce other side effects, in which case it has properties of a method. Despite of the fuzziness in practice, it may be helpful to keep the concepts of "method" and "function" in mind. Let us revisit the previous example of a file filter to check what the concepts look like in practice: File myDir = new File("\\user\\admin\\deploy"); if (myDir.isDirectory()) { File[] files = myDir.listFiles( (File f) -> { return f.isFile(); } );
}
The listFiles method of class java.io.File is a method . It operates on the static and non-static data of the File object it is invoked on. A File object contains fields that give access to properties of the file system element it represents, e.g. its pathname, whether it is a file or a directory, which other elements it contains, etc. These fields are accessed by the listFiles method. The listFiles method takes a file filter as an argument. The file filter is a function , here expressed as a lambda expression. It is passed to the listFiles method which then applies it to all files and directories in myDir . The example illustrates the code-as-data principle: the function (i.e. the file filter lambda) is passed to an operation (i.e. the listFiles method). It also illustrates the separation between "what" and "how": the function describes "what" is supposed to be applied to all files and directory and the receiving method controls "how" the filter is applied, e.g. in which order. In addition, the example illustrates the principle of a pure function: the file filter does not produce side effects; it just takes a File and returns a boolean . In contrast, here is an example of an impure function: File myDir = new File("\\user\\admin\\deploy"); final LongAdder dirCount = new LongAdder(); final LongAdder filCount = new LongAdder(); if (myDir.isDirectory()) { myDir.listFiles( (File f) -> { if (f.isDirectory()) { dirCount .increment(); return false; } if (f.isFile()) { filCount .increment(); return true; } return false; } );
}
The file filter is again expressed as a lambda, but this time it produces side effects, namely incrementing two counters. After this excursion into the realm of programming language principles, we know that lambda expressions in Java denote (pure or impure) functions. As Java is an object-oriented and not a functional programming language the question is: How are lambdas aka functions integrated into the Java programming language? runtime representation Representation of Lambda Expressions
As we have seen in previous examples, we
use lambda expressions to define a
function
, which we can pass as
an argument to an operation. On the receiving side, i.e., in the operation
that takes the lambda, it is used like an
object
.
For illustration we will again use the
previous example of the
listFiles
method from class
java.io.File
.
Below we see where the lambda expression is defined and passed on as an
argument to the
listFiles
method.
File myDir = new File("\\user\\admin\\deploy");
if (myDir.isDirectory()) {
File[] files = myDir.listFiles(
(File
f) -> { return f.isFile(); }
);
}
Here is the receiving side, i.e., the implementation of the listFiles method from class java.io.File : public File[] listFiles( FileFilter filter ) { String ss[] = list(); if (ss == null) return null; ArrayList<File> files = new ArrayList<>(); for (String s : ss) { File f = new File(s, this); if ((filter == null) || filter.accept (f)) files.add(f); } return files.toArray(new File[files.size()]);
}
The listFiles method take a FileFilter object and invokes its accept method. The example demonstrates that a lambda expression has properties in common with functions and objects, depending on the point of view. • Conceptually, a lambda expression is a function . It is an unnamed piece of reusable functionality. It has a signature and a body, but no name. It may or may not be pure, i.e., may or may not have side effects. We pass it around as an argument to a method or as a return value from a method.
•
When
a lambda expression is passed as an argument to a method the receiving
method treats it like an
object
. In the example the lambda expression
(or more precisely, a reference to the lambda expression) is received
as the argument of the
listFiles
method. Inside the
listFiles
method the lambda expression is a reference to an object of a subtype of
the
FileFilter
interface.
This
lambda object
is a regular object; e.g. it has an address and
a type.
Basically, you think in terms of a function when you define a lambda expression and in terms of a method when you use it. The lambda object that links definition and use together is taken care of by the compiler and the runtime system. We as users need not know much about it. Practically all decisions regarding the representation and creation of the lambda object (short for: the object that represents a lambda expression at runtime) are made by the compiler and the runtime system. This is good, because it enables optimizations under the hood that we as users neither want to deal with nor care about. For instance, from the context in which the lambda expression is defined the compiler captures all information that is needed to create a synthetic type for a lambda object, but the compiler does not yet create that type; it just prepares the type generation. The actual creation of the lambda object's synthetic type and the creation of the lambda object itself are done dynamically at runtime by the virtual machine. For this dynamic creation of the synthetic lambda type and the lambda object the invokedynamic byte code is used, which was added to Java in release 7. Using these dynamic features it is possible to delay the creation until first use, which is an optimization: if you just define, but never use, the lambda expression then neither its type nor the lambda object will ever be created. [6] type of lambda expression target type functional interface Functional Interfaces
Part of the magic of tying the definition
of a lambda expression to the use of a lambda expression is inference of
the type by which the lambda expression is used. This type is called
the
target type
. The synthetic type that the runtime system dynamically
creates, as mentioned above, is a subtype of the target type.
In the previous file filter example we
have seen that the target type of our lambda expression is
FileFilter
.
In the example we define a lambda expression, pass it to the
listFiles
method, in which it is then used like an object of a subtype of
FileFilter
.
In a way, this is surprising. After all, we did not specify that our
lambda expression implements the
FileFilter
interface. In fact, we did not say anything regarding the lambda expression's
type. Similarly, the receiving
listFiles
method never indicated that it happily accepts lambda expressions. Instead
it requires an object of a subtype of
FileFilter
.
How does it work?
The underlying magic that the compiler
performs to make it happen is
type inference
. The compiler takes
a look at the context in which the lambda expression is defined and figures
out which type is required. Then it takes a look at the lambda expression
itself and figures out whether the lambda is compatible to the required
type (plus minus a couple of adjustments if needed).
If Java were a functional language then
the most natural type for a lambda expression would be some kind of
function
type
, which is a special category of type that is reserved for description
of functions. A function type merely describes the signature and could
for example look similar to
(int,int)->boolean
if it is meant to denote the type of a function that takes two
int
arguments and returns a
boolean
value.
Java is no functional language and traditionally
had no such thing as a function type. So, the language designers had
the choice to add function types as new category of type. Since they
did not want to introduce major changes to Java's type system they tried
to find a way to integrate lambda expressions into the language without
resorting to function types. They found a way to do it by using special
interface types instead: these interfaces as so-called
functional interfaces
,
previously known as
SAM
types (where SAM stands for Single Abstract Method).
A functional interface essentially is an
interface with one method.
[7]
Countless such interfaces exist in the JDK already, some of them since
its first release. The interface
Runnable
is a classic example of a functional interface. It demands the implementation
of exactly one method:
As functional interfaces and lambda expressions both deal with a single method, the language designers decided that the compiler would convert each lambda expression to a matching functional interface type. This conversion usually happens automatically, because lambda expressions may only appear in a context where such a conversion is feasible. Let us reconsider the previous example: File[] files = myDir.listFiles( (File f) -> { return f.isFile(); }
);
The high-lighted part is the lambda expression. It appears as the argument to the invocation of the listFiles method of class java.io.File . The compiler knows the signature of method listFiles and finds that the method declares java.io.FileFilter as its parameter type. Hence the required type in this particular context is FileFilter . Here is what the FileFilter interface looks like:
public interface FileFilter { boolean accept(File
pathname); }
FileFilter requires exactly one method and for this reason is a functional interface type. Our lambda expression has a matching signature: it takes a File argument, returns a boolean value and throws no checked exception. Hence the compiler converts the lambda expression to the functional interface type FileFilter . Inside the listFiles method the lambda is then indistinguishable from an object of type FileFilter . Here is the implementation of method File.listFiles : public File[] listFiles( FileFilter filter ) { String ss[] = list(); if (ss == null) return null; ArrayList<File> files = new ArrayList<>(); for (String s : ss) { File f = new File(s, this); if ((filter == null) || filter.accept(f) ) files.add(f); } return files.toArray(new File[files.size()]);
}
Inside this method the lambda is bound to the filter argument. The lambda's functionality is triggered when the functional interface's method is invoked, i.e. when filter.accept() is called. The conversion to a functional interface can have funny effects, demonstrated in the following example: Say, we have two functional interfaces:
public interface FileFilter { boolean accept(File
pathname); }
and
public interface Predicate<T> { boolean test(T
t); }
Our lambda is convertible to both functional interface types [8] : FileFilter filter = (File f) -> { return f.isFile(); };
Predicate<File> predicate = (File f) -> { return
f.isFile(); };
filter = predicate; // << error: incompatible types
If, however, we try to assign the two resulting variables to each other the compiler would flag it as an error, although the two variables represent the same lambda expression. The reason is that the two variables to which the lambda is bound are of two different, incompatible types. Also, there might occasionally be situations in which the compiler cannot figure out a matching functional interface type, like in this example:
Object ref = (File f) -> { return f.isFile(); };
The assignment context does not provide enough information for a functional type conversion and the compiler reports an error. Adding a type cast easily solves the problem:
Object ref =
(FileFilter)
(File f) -> { return f.isFile(); };
By and large, functional interface conversion is a way to keep matters simple and avoid the addition of function types to Java's type system. [9] Comparing Lambdas to Anonymous Inner Classes
Lambda expressions appear in situations
where we traditionally used anonymous inner classes, namely in places where
an operation asks for a piece of functionality (i.e. for a function).
While lambda expressions and anonymous inner classes are somewhat exchangeable
they differ in many ways from each other.
syntax
Syntax
Quite obviously the syntax
is different. The notation for a lambda expression is more concise.
Here are examples illustrating how concise a lambda notation can be compared
to an anonymous inner class:
Anonymous inner class:
File[] fs = myDir.listFiles(
new
FileFilter() {
public
boolean accept(File f) { return f.isFile(); }
}
);
Lambda expression in various forms including method reference: File[] files = myDir.listFiles( (File f) -> {return f.isFile(); } ); File[] files = myDir.listFiles( f -> f.isFile() ); File[] files = myDir.listFiles( File::isFile ); Runtime Overhead
Anonymous inner classes
come with a certain amount of overhead compared to lambda expressions.
Use of an anonymous inner class involves creation of a new class type and
creation of a new object of this new type and eventually invocation of
a non-static method of the new object.
At runtime anonymous inner classes require:
o
class
loading,
o
memory
allocation and object initialization, and
o
invocation
of a non-static method.
A lambda expression needs functional interface conversion and eventually invocation. The type inference required for the functional interface conversion is a pure compile-time activity and incurs no cost at runtime. As mentioned earlier, the creation of the lambda object and its synthetic type is performed via the invokedynamic byte code. This allows to defer all decisions regarding type representation, the instance creation, and the actual invocation strategy to runtime. This way the JVM can choose the optimal translation strategy. For instance, the JVM can reduce the effort to the invocation of a constant static method in a constant pool, which eliminates the need for a new class and/or a new object entirely. The bottom line is that lambda expressions have the potential for optimizations and reduced runtime overhead compared to anonymous inner classes. [10] variable binding Variable Binding
Occasionally, a function needs access to
variables from the enclosing context, i.e., variables that are not defined
in the function itself, but in the context in which the lambda appears.
Anonymous inner classes have support for this kind of
variable binding
(also
called
variable capture
): an inner class has access to all
final
variables of its enclosing context. Here is an example:
void method() {
final
int
cnt
= 16;
Runnable r = new Runnable() { public void run() { System.out.println("count: " + cnt ); } }; Thread t = new Thread(r);
t.start();
cnt ++; // error: cnt is final
}
The anonymous inner class that implements the Runnable interface accesses the cnt variable of the enclosing method. It may do so, because the cnt variable is declared as final . Of course, the final cnt variable cannot be modified. Lambda expressions, too, support variable binding. Here is the same example using a lambda expression: void method() {
int
cnt
= 16;
Runnable r = () -> { System.out.println("count:
" +
cnt
); };
Thread t = new Thread(r);
t.start();
cnt ++; // error: cnt is implicitly final
}
The difference is that the cnt variable need not be declared as final in the enclosing context. The compiler automatically treats it as final as soon as it is used inside a lambda expression. In other words, variables from the enclosing context that are used inside a lambda expression are implicitly final (or effectively final ). Naturally, you can add an explicit final declaration, but you do not have to; the compiler will automatically add it for you. Note, since Java 8, the explicit final declaration is no longer needed for anonymous inner classes as well. Since Java 8, both lambda expressions and anonymous inner classes are treated the same regarding variable binding. Both can access all effectively final variables of their respective enclosing context. lexical scoping meaning of this/super Scoping
An anonymous inner class is a class, which
means that it introduces a scope for names defined inside the inner class.
In contrast, a lambda expression is
lexically scoped
, which means
that the lambda expression is not a scope of it own, but is part of the
enclosing scope. Here is an example that illustrates the difference:
Anonymous inner class:
void method() {
int
cnt
=
16;
Runnable r = new Runnable() {
public void run() { int
cnt
= 0;
// fine
System.out.println("cnt
is: " +
cnt
); }
}; ...
}
Lambda expression: void method() { int cnt = 16; Runnable r = () -> { int cnt = 0; // error: cnt has already been defined System.out.println("cnt is: " + cnt ); }; ...
}
Both the anonymous class and the lambda define a variable named cnt while there already is a variable with this name defined in the enclosing method. Since the anonymous class establishes a scope of it own it may define a second cnt variable which is tied to the scope of the inner class. The lambda, in contrast, is part of the enclosing scope and the compiler considers the definition of a cnt variable in the lambda body a colliding variable definition. In the scope of the enclosing context there already is a definition of a cnt variable and the lambda must not define an additional cnt variable of its own. Similarly, the different scoping rules have an impact on the meaning of keywords such as this and super when used inside an anonymous class or a lambda expression. In an anonymous class this denotes the reference to the object of the inner class type and super refers to the anonymous class's super class. In a lambda expression this and super mean whatever they mean in the enclosing context; usually this will refer to the object of the enclosing type and super will refer to the enclosing class's super class. Essentially it means, that it might not always be trivial to replace an anonymous class by a lambda expression. IDEs might help and offer functionality that refactors anonymous classes into lambda expressions. collection framework extension bulk operations Why do we need lambdas?
The key reason for adding lambdas to the
Java programming language is the need to evolve the JDK and in particular
the JDK's collection framework.
[11]
The
traditional API design of the Java collections in package
java.util
renders certain optimizations impossible. At the same time, these optimizations
are badly needed. Considering the current and future hardware architecture
that we design our software for, we need support for increased parallelism
and improved data locality. Let us see why.
Multi-core and multi-CPU platforms are
the norm. While we deal with a couple of cores today, we will have to
cope with hundreds of cores some time in the future. Naturally, we need
to increase the effort for replacing serial, single-threaded execution
by parallel, multi-threaded execution. The JDK intends to support this
effort by offering
parallel bulk operations for collections
.
An example of a parallel bulk operation is the application of a transformation
to each element in a collection. Traditionally such a transformation
is done in a loop, which is executed by a single thread, accessing one
sequence element after the other. The same can be accomplished in parallel
with multiple threads: break down the task of transforming
all
elements
in a sequence into many subtasks each of which operates on a
subset
of the elements. Naturally, this can be done without extending the
collection framework, e.g. by use of existing JDK abstractions such as
the fork-join-framework for instance, but it is quite a bit of an effort
and requires a thorough understanding of concurrency issues. Parallel
execution of bulk operations on collection is not a triviality. Hence,
a key goal of evolving the JDK collections is support for parallelism in
a way that is easy and convenient to use. In order to make this happen
the collections' API must be extended.
A key point of the extension is to separate
"what" operations are applied to sequence elements in a collection from
"how" the operations are applied to these elements. Without a separation
of concerns the collections have no chance to shield us from the complications
of parallel execution of bulk operations. Here is where we come full
circle and the previously discussed principles of functional programming
including lambdas come into play.
New abstractions have been added to the
collection framework in package
java.util
,
most prominently the
stream
abstractions in package
java.util.stream
.
Collections can be converted to streams and streams, different from collections,
access the sequence elements via
internal iteration
(as opposed
to
external iteration
.)
internal
iteration
Internal vs. External Iteration
Iteration is the process of traversing
a sequence. It can be performed in two ways: as an
internal
or
external
iteration.
[12]
Internal iteration means that the sequence itself controls all details
of the iteration process. External iteration means that the sequence
only supports iteration, usually via a so-called
iterator
, but many
aspects of the iteration are controlled by the collection's user. Traditionally,
Java collections offer external iteration. Since Java 8, internal iteration
is supported via streams. Let us see, what the difference between internal
and external iteration is.
Traditionally in Java, elements in a collection
are accessed via an iterator. For illustration we explore an example
that uses a bank account abstraction named
Account
:
class Account {
private long balance;
...
constructors and more methods ...
public long balance() { return balance; }
}
We iterate over a list of bank accounts using a for-each loop: private static void checkBalance(List<Account> accList) { for (Account a : accList) if (a.balance() < a.threshold) a.alert();
}
The for-each-loop internally uses an iterator and basically looks like this: Iterator iter = accList.iterator(); while (iter.hasNext()) { Account a = iter.next(); if (a.balance() < a.threshold) a.alert(); } This loop has potential for improvement in several areas: • The logic of traversing the sequence is intermingled with the functionality that is applied to each element in the sequence. Separating the concerns would allow for optimizations regarding how the elements are accessed. • The execution is single-threaded. Multi-threaded execution is possible by means of the fork-join thread pool, but it would be substantially more complex.
•
There
is a certain amount of redundancy. The iterator's
hasNext
and
next
method have overlap; they
both figure out whether there is a next element. The overhead could be
eliminated.
The traditional access to elements in a sequence via an iterator is known as external iteration in contrast to internal iteration . The idea of internal iteration is: the sequence itself determines how elements in a sequence are accessed and the sequence's user is only responsible for specifying what has to be applied to each element in the sequence. While the external iteration intermingles the logic of sequence traversal with the functionality applied to each of the sequence elements, internal iteration separates the concerns. Here is what internal iteration might look like. Since Java 8, a collection has a forEach method that looks like this:
public void forEach(Consumer<? super E> consumer);
It uses the Consumer interface, which looks like this:
public interface Consumer<T> { public void accept(T
t); }
Given this forEach method the sequence's user only has to supply the functionality that is to be applied to each element in the sequence. private static void checkBalance(List<Account> accList) { accList. forEach ( (Account a) -> { if (a.balance() < a.threshold) a.alert(); } );
}
Now the concerns are separated: the collection is responsible for figuring out how to access all elements and how to apply the specified functionality to them. The user is responsible for supply of a reasonable functionality to be applied to the sequence elements. Given this separation, the implementation of the collection and its forEach method have the liberty to optimize away the overhead of the iterator's hasNext and next method, to split the task into subtasks and execute them in parallel, and several further optimizations such as lazy evaluation, pipeline optimizations for consecutive operations, or out-of-order execution. The traditional intermingled external iteration prevents all of these optimizations. Given the separation of concerns it is now easy to have the loop executed in parallel by multiple threads. Since Java 8, collections can be turned into so-called streams , of which there is a sequential and a parallel variety. The parallel version of the example above would look like this: private static void checkBalance(List<Account> accList) { accList. parallelStream() . forEach ( (Account a) -> { if (a.balance() < a.threshold) a.alert(); } );
}
In the following section we take a closer look at streams, their relationship to collections, and their support for bulk operations via internal iteration. filter-map-reduce streams Streams and Bulk Operations
Since Java 8 the JDK has a new abstraction
named
Stream<E>
which is a view
to a collection and has bulk operations that perform internal iteration.
These bulk operations include
forEach
,
filter
,
map
,
reduce
,
and many more. Thanks to the internal iteration, streams support sequential
as well as parallel execution in a very convenient way. Here is an example:
List<Account> accountCol = … ;
accountCol
.stream()
.filter(a -> a.balance() > 1_000_000_00)
//
intermediate
.map(Account::balance)
//
intermediate
.forEach(b -> {System.out.format("%d\t",b);});
//
terminal
Using method stream() , we create a stream on top of a collection. Different from a collection a stream does not contain any elements. Instead it is a view to a collection that gives access to all elements in the collection by means of bulk operations with internal iteration. A stream has operations such filter() , map() and forEach() , as illustrated in the code snippet above. The filter() method takes a predicate, which is a function that takes an element from the sequence and returns a boolean value. In our example the predicate is the lambda expression a->a.balance()>1_000_000_00 . This predicate is applied to all elements in the sequence and only the "good" elements, i.e. those for which the predicate returns true , appear in the resulting stream. The map() method takes a mapper, which is a function that takes an element from the sequence and returns something else. In our example the mapper maps the accounts to their respective balance using the balance() method of class Account . The result is a stream of long values. The forEach() method takes an arbitrary function, which receives an element from the sequence as an argument and is not supposed to return anything. In our example it prints the element. Some of these methods ( filter() and map() in the example) return a stream; other ones ( forEach() in our example) do not. The methods that return streams are intermediate (aka lazy ) operations, which means that they are not immediately executed, but delayed until a terminal (aka eager ) operation is applied. In the example above, each bank account in the list is accessed only once, namely when the terminal forEach operation is executed. For each bank account the sequence of operations, i.e., filtering, mapping, and printing, is applied. So far, the sequence of filter-map-forEach is executed sequentially, but it is easy to enable parallel execution. We simply need to create a parallel stream. Compare the sequential to the parallel version: Sequential execution: List<Account> accountCol = … ; accountCol . stream () .filter( a -> a.balance() > 1_000_000_00 ) // intermediate .map(Account::balance) // intermediate
.forEach(b -> {System.out.format("%d\t",b);});
//
terminal
Parallel execution: List<Account> accountCol = … ; accountCol . parallelStream () .filter( a -> a.balance() > 1_000_000_00 ) // intermediate .map(Account::balance) // intermediate
.forEach(b -> {System.out.format("%d\t",b);});
//
terminal
We create a parallel stream via the collection's parallelStream() method. This is all we need to say and the rest is taken care of by the implementation of the stream abstraction. The extended collections framework with its streams and bulk operations as outlined above is just one area of Java and its JDK where lambda expressions are helpful and convenient. Independently of the JDK, lambdas can be useful for our own purposes. Before we explore a couple of "functional patterns" that necessitate lambdas we take a closer look at the lambda language feature, its syntax and its properties. Programming with Lambdas
Lambdas are a concept that stems from functional
programming languages and for this reason lambdas support a functional
programming style as opposed to the imperative, object-oriented style Java
programmers are used to. In the following we will first demonstrate the
functional programming style using a simple case study related to the JDK
collections and streams. Subsequently, we will explore an example of a
functional programming pattern known as the
Execute-Around-Method Pattern
.
fluent
programming
Fluent Programming
In the following we intend to explore the
declarative programming style that lambda expressions and the JDK 8 streams
support. We will be using the following abstractions:
public interface Person {
public enum Gender { MALE, FEMALE };
public int getAge();
public String getName();
public Gender getGender();
}
public interface Employee extends Person { public long getSalary(); } public interface Manager extends Employee { } public interface Department { public enum Kind {SALES, DEVELOPMENT, ACCOUNTING, HUMAN_RESOURCES} public Department.Kind getKind(); public String getName(); public Manager getManager(); public Set<Employee> getEmployees(); } public interface Corporation { public Set<Department> getDepartments();
}
For illustration we want to retrieve all managers of all departments in a corporation that have an employee older than 65. Imperative ApproachHere is the traditional imperative approach:/* * find all managers of all departments with an employee older than 65 */ Manager[] find(Corporation c) { List<Manager> result = new ArrayList<>(); for (Department d : c.getDepartments()) { for (Employee e : d.getEmployees()) { if (e.getAge() > 65) { result.add(d.getManager()); } } } return result.toArray(new Manager[0]);
}
The implementation is straight forward. We retrieve all departments from the corporation, then retrieve all employees per department, if we find an employee older than 65 we retrieve the department's manager, and add it to a result collection, which we eventually convert in the array that is returned. declarative programming Declarative ApproachUsing streams and lambdas it would look like this:/* * find all managers of all departments with an employee older than 65 */ Manager[] find(Corporation c) { return c.getDepartments().stream() // 1 .filter(d -> d.getEmployees().stream() // 2 .map(Employee::getAge) // 3 .anyMatch(a -> a>65)) // 4 .map(Department::getManager) // 5 .toArray(Manager[]::new) // 6
}
In line //1 , we retrieve all departments of the corporation via the getDepartments method. The method yields a collection of Department references, which we turn into a stream (of Department references). In line //2 , we use the filter operation on the stream of departments. The filter operation needs a predicate. A predicate is a function that takes an element from the stream (a Department in our example) and returns a boolean value. The map operation applies the predicate to each element in the stream and suppresses all elements for which the predicate returns false . In other words, the stream returned from the map operation only contains those elements for which the filter has been returning true . In our example we need a filter predicate that returns true if the department has an employee older than 65. We provide the predicate as a lambda expression. The lambda expression is fairly ambitious; it takes a Department d , retrieves all employees of that department via the getEmployees method, turns the resulting collection of employees into a stream, maps each employee to its age, and eventually applies the anyMatch operation. In line //3 , we use the map operation on the stream of employees. The map operation needs a function that takes an element from the stream (an Employee in that case) and returns another object of a potentially different type (the age in our example). The mapper function is provided as the method reference Employee::getAge . because the getAge method does exactly what we need: it maps an employee to its age. In line //4 , we use the anyMatch operation of the stream of age values return from the preceding map operation. The anyMatch operation takes elements from the stream, applies a predicate to each element, and stops as soon as an element is found for which the predicate returns true . We supply the required predicate as another lambda that takes the age and returns true if the age is greater than 65. The result of the filter operation and the lengthy lambda expression in line //2 to //4 is the stream of all departments with an employee older than 65. In line //5 , we map each department to its respective manager. As the mapper function we use the method reference Department::getManager . The result is the stream of all managers of all departments with an employee older than 65. So far, none of the lambdas or referenced methods have been executed because filter and map are intermediate operations. In line //6 , we convert the manager list into an array via the stream's toArray method. The toArray method needs a generator, which takes a size value and returns an array of the requested size. The generator in our example is a constructor reference, namely the reference to the constructor of manager arrays: Manager[]::new . The two approaches look very different. The imperative style intermingles the logic of iterating (the various nested loops) and the application of functionality (retrieval of departments and employees, evaluation of age values, collecting results). The declarative style, in contrast, separates concerns. It describes which functionality is supposed to be applied and leaves the details of how the functions are executed to the various operations such as filter , map , anyMatch , and toArray . pipelining Fluent Programming
The small code sample written in declarative
style illustrates what is known as
fluent programming
: the chaining
of operations. It is a programming technique where operations return
a value that allows the invocation of another operation. With fluent
programming it is perfectly natural to end up with one huge statement that
is the concatenation of as many operations as you like.
The JDK streams are designed to support
fluent programming: all intermediate operations return a result stream
to which we can apply the next operation in the chain until we apply a
terminal operation as the tail of the chain. The chaining of operations
is a key feature of streams as it enables all sorts of optimizations.
For instance, the stream implementation can arrange the execution of functions
as a pipeline. Instead of looping over all elements in the sequence repeatedly
(once for
filter
, then
again for
map
, and eventually
for
toArray
) the chain
of filter-mapper-collector can be applied to each element in just one pass
over the sequence.
Here is a brief illustration of the pipelining
performed by the implementation of streams:
String[] txt = {"State","of","the","Lambda","Libraries","Edition"};
IntStream is = Arrays.stream(txt)
.filter(s
-> s.length() > 3)
.map(s
-> s.length())
.forEach(l
-> System.out.println(l));
In the code snippet we take a string array, turn it into a stream, pick all strings longer than 3 characters, and map them to their length.
The source code suggests that first the filter
is applied to all elements in the stream and then the mapper is applied
to all elements in a second pass over the stream. In reality, the chain
of filter and map operation is postponed until the terminal
forEach
operation is executed. The terminal operation triggers only a single
pass over the sequence during which both the filter and the mapper are
applied to each element.
Diagram: Pipelined Execution of filter - map Chain Creating such a pipeline per element does not only reduce multiple passes over the sequence to a single pass, but also allows further optimizations. For instance, the length method need not be invoked twice per string; the repeated call can be eliminated by the JIT compiler. Which style looks more pleasant and understandable is to the eye of the beholder. Certainly the declarative code looks alien to Java programmer unfamiliar with the declarative or fluent programming. After getting accustomed to the declarative style it will probably appear equally pleasant and understandable. execute-around-method Execute-Around-Method Pattern
So far, we have been exploring lambdas
solely in the context of the JDK collections and streams. However, lambda
expressions are useful beyond the JDK stream abstractions and internal
iteration. They are convenient and helpful in all places where operations
take functionality as an argument. One such occasion is the so-called
Execute-Around-Method
pattern.
It is a programming technique for eliminating code duplication.
Occasionally we encounter situations where
it is required that some boilerplate code must to be executed both before
and after a method (or more generally, before and after another piece of
code that varies). Often we simply duplicate the boilerplate code via
copy-and-paste and insert the variable functionality manually. Following
the DRY (don't repeat yourself) principle you might want to remove the
code duplication via copy-and-paste. For this purpose it is necessary
to separate the boilerplate code from the variable code. The boilerplate
code can be expressed as a method and the variable piece of code can be
passed to this method as the method's argument. This is an idiom where
functionality (i.e. the variable piece of code) is passed to a method (i.e.
the boilerplate code). The functionality can be conveniently and concisely
be expressed by means of lambda expressions.
Let us consider an example for the Execute-Around-Method
pattern: use of explicit locks. An explicit
ReentrantLock
(from package
java.util.lock
)
must be acquired and released before and after a critical region of statements.
The resulting boilerplate code looks like this:
class SomeClass {
private ... some data ...
private Lock lock = new ReentrantLock();
...
public void someMethod() { lock.lock(); try { ... critical region ... } finally { lock.unlock(); } }
}
In all places where we need to acquire and release the lock the same boilerplate code of "lock-try-finally-unlock" appears. Following the Execute-Around-Method pattern we would factor out the boilerplate code into a helper method: class Utilities { public static void withLock(Lock lock, CriticalRegion cr) { lock.lock(); try { cr.apply(); } finally { lock.unlock(); } }
}
The helper method withLock takes the variable code as a method argument of type CriticalRegion : @FunctionalInterface public interface CriticalRegion { void apply();
}
Note that
CriticalRegion
is a functional interface and hence a lambda expression can be used to
provide an implementation of the
CriticalRegion
interface. Here is a piece of code that uses the helper method
withLock:
private class MyIntStack { private Lock lock = new ReentrantLock(); private int[] array = new int[16];
private int sp = -1;
public void push(int e) { withLock(lock, () -> { if (++sp >= array.length) resize(); array[sp] = e; });
}
...
}
The boilerplate code is reduced to invocation of the withLock helper method and the critical region is provided as a lambda expression. While the suggested withLock method indeed aids elimination of code duplication it is by no means sufficient. Let us consider another method of our sample class MyIntStack : the pop method, which returns an element from the stack and throws a NoSuchElementException exception if the stack is empty. While the push method did neither return nor throw anything the pop method has to return a value and throws an exception. Neither is allowed by the CriticalRegion 's apply method: it has a void return type and no throws clause. The exception raised in the critical region of the pop method does not cause any serious trouble; it is unchecked and need not be declared in a throws clause. The lambda that we need for implementing the pop method can throw it anyway. The return type, in contrast, causes trouble. In order to allow for lambda expressions with a return type different from void we need an additional CriticalRegion interface with an apply method that returns a result. This way we end up with two interfaces: @FunctionalInterface public interface VoidCriticalRegion { void apply(); } @FunctionalInterface public interface IntCriticalRegion { int apply();
}
Inevitably, we also need additional helper methods. class Utilities { public static void withLock(Lock lock,VoidCriticalRegion cr) { lock.lock(); try { cr.apply(); } finally { lock.unlock(); } } public static int withLock(Lock lock, IntCriticalRegion cr) { lock.lock(); try { return cr.apply(); } finally { lock.unlock(); } }
}
Given the additional helper method and functional interface the pop method can be implemented like this:
private class MyIntStack {
...
public int pop() { return withLock(lock, () -> { if (sp < 0) throw new NoSuchElementException(); else return array[sp--]; }); }
}
In analogy to the return type you might wonder whether we need additional functional interfaces for critical regions with different argument lists. It turns out that arguments to the critical region are usually not an issue. The lambda expression, which implements the critical region interface, can access (implicitly) final variable of the enclosing context. Consider the push method again; it takes an argument. private class MyIntStack { private Lock lock = new ReentrantLock(); private int[] array = new int[16];
private int sp = -1;
public void push( final int e ) { withLock(lock, () -> { if (++sp >= array.length) resize(); array[sp] = e ; });
}
...
}
The critical region lambda accesses the push method's argument, which is either explicitly declared final or implicitly treated as such. As long as the critical region only reads the argument and does not modify it there is no need for additional helper methods or functional interfaces. Wrap-UpProgramming with lambda expressions has two aspects:• Using lambdas to define ad-hoc functionality that is passed to existing operations, such as calling the streams' forEach , filter , map , and reduce operations.
•
Designing
functional APIs
that take lambda expressions as arguments. At the heart
of designing such APIs is the execute-around pattern. Even the streams'
forEach
,
filter
,
map
,
and
reduce
operations
are example of execute-around: they are loops that execute around the lambda
that we pass in.
CONTENT NEXT INDEX |
|||||||||||||||||||
© Copyright 1995-2013 by Angelika Langer. All Rights Reserved. URL: < http://www.AngelikaLanger.com/Lambdas/LambdaTutorial/lambdatutorial_4.html> last update: 17 Jul 2013 |