|
||||||||||||||||||||||||||||||||||||||||||
HOME | COURSES | TALKS | ARTICLES | GENERICS | LAMBDAS | IOSTREAMS | ABOUT | CONTACT | | | | ||||||||||||||||||||||||||||||||||||||||||
|
Java Generics - Introduction
|
|||||||||||||||||||||||||||||||||||||||||
Language Features of Java Generics Introduction and Overview
JavaPro Online, March 2004
Language Features - Overview and Introduction Language FeaturesJ2SE 1.5 will become available by mid of 2004 and will include support for generic types and methods (see / JDK15 /). This new language feature, known as Java Generics (JG), is a major addition to the core language. In this article we will give an overview of the new feature.What is the purpose of generics?The need for generic types stems from the implementation and use of collections, like the ones in the Java collection framework (see / JDK15 /). Typically, the implementation of a collection of objects is independent of the type of the objects that the collection maintains. For this reason, it does not make sense to reimplement the same data structure over and over again, just because it will hold different types of elements. Instead, the goal is to have a single implementation of the collection and use it to hold elements of different types. In other words, rather than implementing a class IntList and StringList for holding integral values and strings respectively, we want to have one generic implementation List that can be used in either case.In Java, this kind of generic programming is achieved today (in non-generic Java) by means of Object references: a generic list is implemented as a collection of Object references. Since Object is the superclass of all classes the list of Object references can hold references to any type of object. All collection classes in the Java platform libraries (see / JDK15 /) use this programming technique for achieving genericity. As a side effect of this idiom we cannot have collections of values of primitive type, like a list of integral values of type int , because the primitive types are not subclasses of Object . This is not a major restriction because every primitive type has a corresponding reference type. We would convert int s to Integer s before we store them in a collection - a conversion that is known as boxing and that will be supported as an automatic conversion ( autoboxing ) in JDK 1.5 (see / BOX /). A Java collection is very flexible; it can be used for holding reference to all types of objects. The collection need not even be homogeneous, that is, hold objects of the same type, but it can equally well be heterogeneous, that is, contain a mix of objects of different types. Using generic Java collections is straightforward. Elements are added to the collection by passing element reference to the collection. Each time we extract an object from a collection we receive an Object reference. Before we can effectively use the retrieved element, we must restore the element’s type information. For this purpose we cast the returned Object reference down to the element’s alleged type. Here is an example:
LinkedList list = new LinkedList();
We must cast the Object reference returned from method get() down to type Integer . The cast is safe because it is checked at runtime. If we tried a cast to a type different from the extracted element’s actual type, a ClassCastException would be raised, like in the example below: String s = (String) list.get(0); // fine at compile-time, but fails at runtime with a ClassCastException The lack of information about a collection’s element type and the resulting need for countless casts in all places where elements are extracted from a collection is the primary motivation for adding parameterized types to the Java programming language. The idea is to adorn the collection types with information about the type of elements that they contain. Instead of treating every collection as a collection of Object references, we would distinguish between collections of references to integers and collections of references to strings. A collection type would be a parameterized (or generic) type that has a type parameter, which would specify the element type. With a generic list, the previous example would look like this:
LinkedList
<Integer>
list = new LinkedList
<Integer>
();
Note, that the get() method of a generic list returns a reference to an object of a specific type, in our example of type Integer , in which case the cast from Object to Integer is not needed any longer. Also, use of the extracted element as though it were of a different type would now be caught at compile time already, rather than at runtime. The example below would simply not compile: String s = list.get(0); // compile-time error This way, Java Generics increase the expressive power of the language and increase the type safety of the language by enabling early static checks instead of late dynamic checks.
Java Generics do not only provide us with parameterized collection types
like the one we used in the example above, it also allows us to implement
generic types ourselves. In order to see, how we can use the new
language feature for our own Java programs let us explore Java Generics
in more depth. In the following we will briefly look at the syntax of the
definition of generic types and further language features related to Java
generics.
Generic TypesListing 1 gives an example of the definition of several generic types. The sample code shows a sketch of a parameterized collection LinkedList<A> , its superinterface Collection<A> and its iterator type Iterator<A> . The types in this example are inspired by the collection classes from the Java platform libraries in package java.util .
Parameterized types have type parameters. In our example they have exactly one parameter, namely A . In general, a parameterized type can have arbitrarily many parameters. In our example, the parameter A stands for the type of the elements contained in the collection. A parameter such as A is also called a type variable . Type variables can be imagined as placeholders that will later be replaced by a concrete type. For instance, when an instantiation of the generic type, such as LinkedList<String> , is used, A will be replaced by String .
Later in this article we will see that there are restrictions regarding
the use of type variables and we will realize that a type variable cannot
be used like a type, i.e. the analogy with a “placeholder for a type”
is not fully correct, just an approximation of what a type variable is.
But for the time being, let’s regard the type variable as a placeholder
for a type – the type of the elements contained in the collection, in our
example.
BoundsFor implementation of a generic list like in our example above we never need to invoke any method of the element type. A list just uses references to the elements, but never really accesses the elements. For this reason it need not know anything about the element type. Not all parameterized types have such rudimentary requirements to their element types.
Imagine we would want to implement a hash-based collection, like a hash
table. A hash-based collection needs to calculate the entries’ hash
codes. However, the element type is unknown in the implementation
of a parameterized hash table. Only the type variable representing
the element type is available. Listing 2 shows an excerpt of the implementation
of a parameterized hash table. It is a parameterized class that has
two type parameters for the key type and the associated value type.
As we can see, the implementation of the hash table does not only move around references to the entries, but also needs to invoke methods of the key type, namely hashCode() and equals() . Both methods are defined in class Object . Hence the hash table implementation requires that the type variables Key and Data be replaced by concrete types that are subtypes of Object . Later in this article we will see that this is always guaranteed, because primitive types are prohibited as type arguments to generics. A concrete type that replaces a type variable must be a reference type and for this reason we can safely assume that the key type has the required methods.
What if needed to invoke methods that are not defined in class
Object
?
Consider the implementation of a tree-based collection. Tree-based
collections, like a
TreeMap
, require a sorting order for the contained
elements. Element types can provide the sorting order by means of
the
compareTo()
method, which is defined in the
Comparable
interface. The implementation of a tree-based collection might therefore
want to invoke the element type’s
compareTo()
method. Listing
3 below is a first attempt of an implementation of a parameterized
TreeMap
collection.
The parameterized class
TreeMap
has two type parameters
Key
and
Data
; no requirements are imposed on either of these type
variables. With this implementation we could create an
TreeMap<X,Y>
even if the key type
X
did not implement the
Comparable<X>
interface and had no
compareTo()
method. The invocation
of
compareTo()
, or more precisely, the cast of the key object
to the type
Comparable<Key>
for the incomparable key type
X
,
would then fail at runtime with a
ClassCastException
.
In order to allow for an early compile-time check, Java Generics has a language feature named bounds : type variables of a parameterized type can have one or several bounds. Bounds are interfaces or superclasses that a type variable is required to implement or extend. If a parameterized type is instantiated with a concrete type argument that does not implement the required interface(s) or the required superclass, then the compiler will catch that violation of the requirements and will issue an error message.
In our example, we could require that the key type of our
TreeMap
must implement the interface
Comparable<Key>
by specifying
a bound for the type variable
Key
. The modified implementation
of
TreeMap
is shown in Listing 5 below
Now the attempt of using a key type that does not implement the
Comparable
interface will be rejected by the compiler, like in the example in Listing
6 below.
The primary purpose of bounds is to enable early compile-time checks.
A type variable can have several bounds. The syntax is: TypeVariable extends Bound 1 & Bound 2 & ... & Bound n Here is an example:
final class Pair<A extends Comparable<A> & Cloneable<A>,
As the example above suggests, type variables can appear in their bounds. For instance, the type variable A is used as type argument to the parameterized interface Comparable , whose instantiation Comparable<A> is a bound of A . There is a restriction regarding bounds that are instantiations of a parameterized interface: the different bounds must not be instantiations of the same parameterized interface. The following would be illegal:
class SomeType<T extends
Comparable<T>
& Comparable<String> & Comparable<StringBuffer>
>
This restriction stems from the way Java Generics are implemented and will be explained later in this article. Classes can be bounds, too. The concrete type argument is then required to be a subclass of the bounding class or it can be the same class as the bounding class. Even final classes are permitted as bounds. Bounding classes, like interfaces, give access to non-static methods that the concrete type argument inherits from its superclass. Bounding classes do not give access to constructors and static methods. The bounding superclass must appear as the first bound in a list of bounds. Hence the syntax for specification of bounds is:
TypeVariable implements Superclass & Interface
1
& Interface
2
& ... & Interface
n
Generic MethodsNot only types can be parameterized. In addition to generic classes and interfaces, we can define generic methods. Static and non-static methods as well as constructors can be parameterized in pretty much the same way as we parameterized types in the previous sections. The syntax is a little different, see below. Everything said about type variables of parameterized types applies to type variables of parameterized methods in the exact same way.
Listing 7 shows the example of a parameterized static method
max()
:
Parameterized methods are invoked like regular non-generic methods.
The type parameters are inferred from the invocation context. In our example,
the compiler would automatically invoke
<Byte>max()
.
The type inference algorithm is significantly more complex than this simple
example suggests and exhaustive coverage of type inference is beyond the
scope of this article.
Wildcard Instantiations of Parameterized TypesFor sake of completeness we want to briefly touch on wildcards. (For a more details discussion of wildcards see / PRO2 /). So far we have been instantiating parameterized types using a concrete type that replaces the type parameter in the instantiation. In addition, so-called wildcards can be used to instantiate a parameterized type. A wildcard instantiation looks like this:List<? extends Number> ref = new LinkedList<Integer>(); In this statement List<? extends Number> ist is a wildcard instantiation, while LinkedList<Integer> is a regular instantiation. There are 3 types of wildcards: “ ? extends Type ”, “ ? super Type ” and “ ? ”. Each wildcard denotes a family of types. “ ? extends Number ” for instance is the family of subtypes of type Number , “ ? super Integer ” is the family of supertypes of type Integer , and “ ? ” is the set of all types. Correspondingly, the wildcard instantiation of a parmeterized type stands for a set of instantiations; e.g. List<? extends Number> refers to the set of instantiations of List for types that are subtypes of Number . Wildcard instantiations can be used for declaration of reference variables, but they cannot be used for creation of objects. Reference variables of an wildcard instantiation type can refer to an object of a compatible type, though. Compatible in this sense are concrete instantiations from the family of instantiations denoted by the wildcard instantiation. In a way, this is similar to interfaces: we cannot create objects of an interface types, but a variable of an interface type can refer to an object of a compatible type, “compatible” meaning a type that implements the interface. Similarly, we cannot create objects of a wildcard instantiation type, but a variable of the wildcard instantiation type can refer to an object of a compatible type, “compatible” meaning a type from the corresponding family of instantiations. Access to an object through a reference variable of a wildcard instantiation type is restricted. Through a wildcard instantiation with “extends“ we must not invoke methods that take arguments of the type that the wildcard stands for. Here is an example:
List<? extends Number> list = new LinkedList<Integer>();
The add() method of type List takes an argument of the element type, which is the type parameter of the parameterized List type. Through a wildcard instantiation such as List<? extends Number> it is not permitted to invoke the add() method. Similar restrictions apply to wildcards with “super“: methods where the return type is the type that the wildcard stands for are prohibited. And for reference variables with a “ ? “ wildcard both restrictions apply.
This brief overview of wildcard instantiations is far from comprehensive;
exhaustive coverage of wildcards is beyond the scope of this article.
In practice, wildcard instantiations will most frequently show up as argument
or return types in method declarations, and only rarely in the declaration
of variables. The most useful wildcard is the “extends” wildcard.
Examples for the use of this wildcard can be found in the J2SE 1.5 platform
libraries; an example is the method
boolean addAll(Collection<?
extends ElementType> c)
of class
java.util.List
. It
allows addition of elements to a
List
of element type
ElementType
,
where the elements are taken from a collection of elements that are of
a subtype of
ElementType
.
Summary of Java Generics Language FeaturesNow we have discussed all major language features related to Java generics:There are many more details not covered here. We want to use the remainder of the article to explore some of the underlying principles of Java generics, in particular the translation of paramterized types and methods into Java byte code. While this sounds pretty technical and mainly like a compiler builder’s concern, an understanding of these principles aids understanding of many of the less obvious effects related to Java generics. Implementation of the Java Generics Language FeaturesHow are Java Generics implemented? What does the Java compiler do with our Java source code that contains definitions and usages of parameterized types and methods? Well, as usual the Java compiler translates the Java source code into Java byte code. In the following, we intend to take a look under the hood of the compilation process in order to understand the effects and side effects of Java generics.Translation of GenericsA compiler that must translate a parameterized type or method (in any language) has in principle two choices:
This is particularly wasteful in cases where the elements in a collection are references (or pointers), because all references (or pointers) are of the same size and internally have the same representation. There is no need for generation of mostly identical code for a list of references to integers and a list of references to strings. Both lists could internally be represented by a list of references to any type of object. The compiler just has to add a couple of casts whenever these references are passed in and out of the generic type or method. Since in Java most types are reference types, it deems natural that Java chooses code sharing as its technique for translation of generic types and methods. [C#, by the way, uses both translation techniques for its generic types: code specialization for the value types and code sharing for the reference types.] One downside of code sharing is that it creates problems when primitive types are used as parameters of generic types or methods. Values of primitive type are of different size and require that different code is generated for a list of int and a list of double for instance. It’s not feasible to map both lists onto a single list implementation. There are several solutions to this problem:
Type ErasureIn the following we want to look into the details of the code sharing implementation of Java generics. The key question is: how exactly does the Java compiler map different instantiations of a parameterized type or method onto a single representation of the type or method?The translation technique used by the Java compiler can be imagined as a translation from generic Java source code back into regular Java code. The translation technique is called type erasure : the compiler removes all occurrences of the type variables and replaces them by their leftmost bound or type Object , if no bound had been specified. For instance, the instantiations LinkedList<Integer> and a LinkedList<String> of our previous example (see Listing 1) would be translated into a LinkedList<Object> , or LinkedList for short, and the methods <Integer>max() and <String>max() (from Listing 7) would be translated to <Comparable>max() . In addition to removal of all type variables and replacing them by their leftmost bound the compiler inserts a couple of casts in certain places and adds so-called bridge methods where needed. The translation from generic Java code into regular Java code was deliberately chosen by the Java designers. One key requirement to all new language features in Java 1.5 is their compatbility with previous versions of Java. In particular it is required that a pre-1.5 Java virtual machine must be capable of executing 1.5 Java code. This is only achievable if the byte code resulting from a 1.5 Java source looks like regular byte code resulting from pre-1.5 Java code. Type erasure meets this requirement: after type erasure there is no difference any more between a parameterized and a regular type or method. For explanatory reasons we described the type erasure as a translation not from generic Java code into regular non-generic Java code. This is not exactly true; the translation is from generic Java code directly to Java byte code. Despite of that we will refer to the type erasure process as a translation from generic Java to non-generic Java for the subsequent explanations.
Listing 8 below illustrates the translation by type erasure; is shows
the type erasure of our previous example of generic types from Listing
1.
As you can see, all occurrences of the type variable A are replaced by type Object . The implementation of our generic collection is now exactly like an implementation that uses the traditional Java technique for genericity, namely implementation in terms of Object references. The sample code also gives an example of an automatically inserted cast: in the main() method, where a linked list of strings is used, the compiler added a cast from Object to String .
Listing 9 below shows the type erasure of our parameterized
max()
method
from Listing 7.
Again, all occurrences of type variables are replaced by either type Object (in the Comparable interface) or the leftmost bound (type Comparable in method max() ). Again, we see the inserted cast from Object to Byte in the main() method where the generic method is invoked for a collection of Byte s. And we see an example of a bridge method in class Byte .
The compiler inserts bridge methods in subclasses to ensure overriding
works correctly. In the example, class
Byte
implements interface
Comparable<Byte>
and must therefore override the superinterface’s
compareTo()
method.
The compiler translates the
compareTo()
method of the generic
interface
Comparable<A>
to a method that takes an
Object
,
and translates the
compareTo()
method in class
Byte
to
a method that takes a
Byte
. After this translation, method
Byte.compareTo(Byte)
is no overriding version of method
Comparable<Byte>.compareTo(Object)
any
longer, because the two methods have different signatures as a side effect
of translation by erasure. In order to enable overriding the compiler
adds a bridge method to the subclass. The bridge method has the same
signature as the superclass’s method that must be overridden and delegates
to the other methods in the derived class that was the result of translation
by erasure.
SummaryIn this article we gave an overview over all major language features related to parameterized types and methods. Naturally, coverage of a fairly complex language feature such as Java generics in an article like this cannot be exhaustive. There are many more details to be explored and understood before Java generics can be used in a reliable and effective manner (see for instance the articles on wildcards / PRO2 /). The greatest difficulties in using and understanding Java generics stem perhaps from the type erasure translation process, by which the compiler elides all occurrences of the type parameters. This leads to quite a number of surprising effects. Just to name one: arrays of parameterized types are prohibited in Java, that is, Comparable<String>[] is an illegal type, while Comparable[] is permitted. This is suprising at best and turns out to be quite a nuisance in practice. It boils down to the fact that arrays are best avoided and replaced by collections as soon as the element type is a parameterized type. This realization and many other tips and techniques demand thorough exploration before the new language feature can be exploited to its capacity. Despite of the rough edges here and there, the addition of Java generics adds substantial expressive power to the Java programming language. Our own experience is: once you’ve been using generics for a while you’ll miss them badly if you have to return to non-generic Java with its unspecific types and countless casts and runtime checks.References
|
||||||||||||||||||||||||||||||||||||||||||
© Copyright 1995-2012 by Angelika Langer. All Rights Reserved. URL: < http://www.AngelikaLanger.com/Articles/JavaPro/01.JavaGenericsIntroduction/JavaGenerics.html> last update: 4 Nov 2012 |