Wednesday, June 18, 2014

Java 8 and Data Parallelism

By Definition, Parallelism reduces the runtime of a specific task by breaking it down into smaller components and performing them in parallel. Since we are already in Big Data age, Data Parallelism is going to play a big role. To achieve something meaningful from Big Data, we need to analyse a lot of data coming from different data sources. We need different algorithms that can process data faster and can consume the power of all computation units. Data Parallelism is one of those techniques which can analyse a lot of data quickly.

Let’s first define the term ‘Data Parallelism’. Data Parallelism means splitting up the data and assigning a single processing unit to each chunk of data. This works really well if you want to perform the similar operation on a lot of data.

What does Java provide to realize ‘Data Parallelism’?

Java 8 has come up with a lot of new libraries and “streams” library is one of them. Making an operation execute in parallel using the streams library is a matter of changing a single method call. You can execute streams in serial or in parallel. When a stream executes in parallel, the Java runtime partitions the stream into multiple substreams. Aggregate operations iterate over and process these substreams in parallel and then combine the results.

Streaming can be realized in two ways:
·        If you already have a Stream object, then you can call its parallel() method in order to make it parallel.
·        If you’re creating a Stream from a Collection, you can call the parallelStream method in order to create a parallel stream.

Following example calculates the total orders of a sequence of products. It transforms each product into its component orders, then gets into the price of each product, and then sums them:

public int getTotalPrice() {
return products.stream()
.flatMap(Product::getOrders)
.mapToInt(Order::getPrice)
.sum();
}

We can perform this operation in parallel by using parallelStream method:

public int getTotalPrice() {
return products.parallelStream()
.flatMap(Product::getOrders)
.mapToInt(Order::getPrice)
.sum();
}

No comments: