Streams à la carte: Extensible stream pipelines with object algebras.
The StreamAlg repository contains the source code artifact that accompanies the Streams à la carte: Extensible Pipelines with Object Algebras paper, to appear at the 29th European Conference on Object-Oriented Programming (ECOOP'15).
We address extensibility shortcomings in libraries for lazy-streaming queries with a new design. The architecture underlying this design borrows heavily from Oliveira and Cook's object algebra solution to the expression problem, extended with a design that exposes the push/pull character of the iteration, and an encoding of higher-kinded polymorphism.
In this library we apply our design to Java and show that the addition of full extensibility is accompanied by high performance, matching or exceeding that of the original, highly-optimized Java streams library.
In this repository we present a fundamental set of sequential operators map
,
filter
, reduce
, count
, take/limit
and iterate
.
Additionally we present the behaviors that are discussed in the paper: push, pull, fused pull, logging, id (for blocking terminal operators), future (for non-blocking terminal operators).
The project runs with Java 8.
Clone the project:
git clone git@github.com:biboudis/streamalg.git
The project is built with maven and its dependencies are automatically resolved: Guava, JMH and JUnit. To run the test suite simply run:
mvn test
The tests cover all examples included in the paper (operators, behaviors) and cases used as motivation as well. The streams
package is covered at: 87% classes, 90% methods, 91% lines.
Benchmarks are reproduced by executing:
sh run_benchmarks.sh
The basic packages of this artifact are the following:
take
combinator.iterate
terminal operator.take
combinator.The following factories implement different combinations of behaviors:
The types that participate in higher-kinded polymorphism scenarios are: Future
, Id
, Pull
and Push
.
The encoding appears in the gadt.evaluator
package.
In streamalg/fluent/Stream.cs
and streamalg/fluent/Stream.scala
we describe a
possible solution of obtaining fluent APIs of Streams a la carte in C# and Scala.
The run_benchmarks.sh
script simply builds the JMH benchmarks über-jar and then uses the command line interface
of JMH to pass the arguments of the experiments. The script will run all benchmarks in
streamalg/src/main/java/benchmarks/*
and their description is included in the paper.
The user can run the benchmark script as is or by passing a regular-expression for filter to select only some of them. The
micro-benchmark suite can be passed the number of elements N
for map, count, operations with large number of elements, N_small
that is used for cart and limit/take examples. N_limit
is used as the parameter for limit
and F
is used for
benchmarks with fused pipelines.
For more information on JMH the user can run it directly,
e.g., to get the help dialog java -jar target/microbenchmarks.jar -h
.
We omitted baseline tests from the paper (although we included them in the repo) as the focus of the paper is not on comparing hand-optimized tight loops with streaming pipelines. We have investigated this in previous work (http://arxiv.org/abs/1406.6631) and it is something that we would like to investigate about the OpenJDK specifically in the immediate future.
We include three basic categories of benchmarks: basic pipelines with various combinations about both Pull and Push algebras,
fused pipelines to exercise map and filter fusion and help with the comparison between the non-fused pipelines,
iterator pipelines to demonstrate differences of the Pull algebra and the obtaining of an iterator from Java 8 Streams
and take pipelines (the take
operator is the same as the limit
operator in Java 8 Streams).
Aggelos Biboudis (@biboudis), Nick Palladinos (@NickPalladinos), George Fourtounis (@gf0ur) and Yannis Smaragdakis (@YSmaragdakis).