Monday, July 30, 2007

Tips on Collection in java

Let's start with a quiz
As is often the case, I am going to begin this lesson with a little quiz just to make sure that you are awake. Is the following statement True or False?
The TreeSet class is a direct implementation of the Collection interface.The answer is False. The TreeSet class is not a direct implementation of the Collection interface. Rather, the TreeSet class is a direct implementation of the SortedSet interface. The SortedSet interface extends the Set interface, and the Set interface extends the Collection interface.
The root of the collection hierarchy
The Collection interface is the root of the collection hierarchy. JDK 1.3 doesn't provide any direct implementations of the Collection interface. All of the implementations of the interfaces in the Java Collections Framework implement one of the subinterfaces of the Collection interface.
What does Sun say about this?
Here is what the Sun documentation has to say on the topic of the Collection interface:
"The SDK does not provide any direct implementations of this interface: it provides implementations of more specific subinterfaces like Set and List. This interface is typically used to pass collections around and manipulate them where maximum generality is desired."
The Sun documentation also states:
"Bags or multisets (unordered collections that may contain duplicate elements) should implement this interface directly."
However, JDK 1.3 does not provide any implementations for bags or multisets. If you need collections of these types, you will need to define the implementation classes yourself.
What about duplicate elements?
Some implementations of Collection allow duplicate elements, and others do not. Implementations of the List interface (such as ArrayList) allow duplicate elements. Implements of Set and SortedSet (such as TreeSet) do not allow duplicate elements. This was illustrated in a previous lesson entitled Data Structures in Java: Part 5, The Core Collection Interfaces.
A sample program in that lesson created two collection objects and applied the polymorphic add() method to add the same elements to each collection. One of the collection objects was of type ArrayList, and the other collection object was of type TreeSet. The elements added to each collection contained one pair of duplicate elements. The duplicate element was automatically excluded from the TreeSet object, but was retained in the ArrayList object.
So, what is a set?
According to Sun, a Set is a "collection that contains no duplicate elements ... this interface models the mathematical set abstraction." An object of type Set is typically used to model collections such as Social Security numbers where duplicates are not allowed.
And, what is a list?
Also according to Sun, a List is "An ordered collection (also known as a sequence). The user of this interface has precise control over where in the list each element is inserted. The user can access elements by their integer index (position in the list), and search for elements in the list."
Ordered is not the same as sorted
Note that an ordered collection is not the same as a sorted collection. The fact that the collection is ordered derives from the fact that each element in the collection has a specific position specified by an index. In a sorted collection, the position of each element is determined by its value relative to the values of its predecessors and successors.
Sun goes on to say, "Unlike sets, lists typically allow duplicate elements. More formally, lists typically allow pairs of elements e1 and e2 such that e1.equals(e2), and they typically allow multiple null elements if they allow null elements at all."
Is ascending sort order always required?
Not all implementations of the Collection interface maintain the elements in ascending sort order. Some may, and others do not. For example, as discussed above, implementations of the List interface (such as ArrayList) do not maintain their elements in sorted order at all. In other words, the position of an element in an ArrayList does not depend on the value of the element.
On the other hand, implementations of the interfaces named SortedSet (such as TreeSet) and SortedMap do maintain their elements in sorted order. However, that order is not necessarily ascending. When an object is instantiated from a class that implements the SortedSet interface, the sorting order for that object can be established by providing an object instantiated from a class that implements the Comparator interface. In that case, the author of the implementing class determines the order imposed on the elements in the collection.
Does case matter in String objects?
For example, if your SortedSet object contains references to String objects, the natural ascending sort would take the difference between upper case and lower case characters into account. However, you might prefer that case be ignored when establishing the sorted order. You can accomplish this by providing an object of a class that implements the Comparator interface and which defines the compare() and equals() methods in such a way as to eliminate case considerations for comparisons of String objects. (The lesson entitled Swing from A to Z: Analyzing Swing Components, Part 1, Concepts contains a sample program named Introspect03 that implements the Comparator interface for exactly this purpose.)
Subinterfaces have more stipulations
As you progress down the inheritance hierarchy, you find that additional stipulations apply at each level of inheritance. As an example, according to Sun, "The Set interface places additional stipulations, beyond those inherited from the Collection interface, on the contracts of all constructors and on the contracts of the add, equals and hashCode methods."
The important point is that specific subinterfaces of the Collection interface can define requirements that do not apply to all subinterfaces of the Collection interface. For example, the add() method of the Set interface stipulates the following:
"Adds the specified element to this set if it is not already present."On the other hand, the add() method of the Collection interface simply states:
"Ensures that this collection contains the specified element."Thus, the contract for the add() method of an object of a class that implements the Set interface is more specialized than the contract for the add() method of an object of a class that implements Collection interface.
An additional stipulation on the constructor for a Set object is that all constructors must create a set that contains no duplicate elements.
Stipulations on SortedSet
The SortedSet interface extends the Set interface. The SortedSet interface contains the following stipulation that makes it more specialized than a Set.
"A set that further guarantees that its iterator will traverse the set in ascending element order, sorted according to the natural ordering of its elements (see Comparable), or by a Comparator provided at sorted set creation time."
Let's end with a quiz
I'm going to finish this lesson with several questions in the form of a quiz. To ensure that this is a learning experience, I will provide an explanation in addition to the answer for each question.
Q1 True or False? A collection that implements the List interface maintains its elements in ascending alphanumeric order.
The answer to question 1 is false. Unlike collections that implement the SortedSet interface, the order of the elements in a collection that implements the List interface is not based on the values of the objects referred to by the elements in the list.
Q2 True or False? A collection that implements the List interface is an unordered collection.
The answer to question 2 is also false. A collection that implements the List interface is an ordered collection (also known as a sequence). According to Sun, "The user of the interface has precise control over where in the list each element is inserted." Elements can be inserted and retrieved on the basis of their integer index (position in the list) using the following methods:
add(int index, Object element) get(int index)
Valid index values are positive integers that begin with zero. When the add method is used to insert an element at a specific position in the sequence, the element currently at that position (if any) and any subsequent elements are shifted toward higher index values to make room for the new element.
Another version of the add method takes a reference to an object as an incoming parameter and appends the specified element to the end of the collection.
The get method simply returns the element at the specified position in the collection.
The List interface also declares various other methods that can be used to manipulate the contents of the collection.
Q3 True or False? A collection that implements the List interface is allowed to contain duplicate values.
The answer to question 3 is true. Unlike a collection that implements the Set interface, a collection that implements the List interface is typically allowed to contain duplicate values. More formally, according to Sun, "lists typically allow pairs of elements e1 and e2 such that e1.equals(e2), and they typically allow multiple null elements if they allow null elements at all."
Q4 True or False? The contracts of the methods in the List interface are the same as the contracts of the methods inherited from the Collection interface.
The answer to question 4 is false. According to Sun, "The List interface places additional stipulations, beyond those specified in the Collection interface, on the contracts of the iterator, add, remove, equals, and hashCode methods."
For example, the iterator() method (for both the List and Collection interfaces) returns an iterator over the elements in the collection. For the Collection interface, there are no guarantees concerning the order in which the elements are returned by the methods of the Iterator object.
On the other hand, the iterator() method for the List interface returns an iterator over the elements in the collection in proper sequence, where the sequence is determined by the numeric index. In other words, when you invoke the methods of the Iterator object on a List, the elements will be returned in the proper sequence as determined by a numeric index.
Similarly, according to Sun, the SortedSet interface "guarantees that its iterator will traverse the set in ascending element order, sorted according to the natural ordering of its elements (see Comparable), or by a Comparator provided at sorted set creation time."
SummaryIn this lesson you learned that all of the implementations of the interfaces in the Java Collections Framework implement one of the subinterfaces of the Collection interface. You learned that a Set object cannot contain duplicate elements, but a List object can contain duplicate elements.
You learned about the difference between ordered collections and sorted collections. You also learned about ascending order and the natural ordering of objects. In addition, you learned how more specialized stipulations are placed on interfaces as you progress down the interface inheritance hierarchy of the Java Collections Framework.

0 comments: