import speed._
What?
speed is a compile-time only macro library that provides functionality similar to Scala collection's views but with an efficient implementation that inlines the complete processing into a (mostly) single while-loop at the use site.
As an example see this:
array.speedy.map(x => x * x).foldLeft(0)(_ + _)
will be converted into code roughly equivalent to this:
var acc = 0
var i = 0
while (i < array.length) {
val x = array(i)
acc = acc + x * x
i += 1
}
acc
Installation
resolvers += Opts.resolver.sonatypeReleases
// only needed at compile-time
libraryDependencies += "net.virtual-void" %% "speed" % "15" % "provided"
Usage
There are two principle ways of using speed. Either, you explicitly request optimization by
"converting" a collection into a speedy one using .speedy
. Or, you can also import automatic
conversions in which case supported collections will be auto-optimized (automatic conversion
currently only supported for Range
and Array
).
Similar to Java 7 Streams, a speed expression must start with a "Generator" like a Range
or
an Array
, then apply some non-terminal processing steps, and finally end with a terminal result.
Intermediate results are usually never materialized into a collection and therefore cannot be put
into a variable. Instead, use to[Coll]
as a terminal step to create a result collection if needed.
Explicit
Add an import to speed in the source file:
import speed._
and then, convert a collection to its speedy variant by adding .speedy
(similar as you would add .view
):
array.speedy.map(_ + 2).sum // this will run fast...
Automatic
Ranges and arrays can also be automatically optimized by import speed.auto._
at the top of a
source file:
import speed.auto._
array.map(_ + 2).sum // this will also run fast...
Semantics
Semantics are similar to what to expect from Scala collection's views. However, if in question we'll take the liberty to divert from the known behavior to allow more aggressive optimizations.
The basic optimization approach is based on aggressive inlining the whole processing pipeline into a single while-loop.
Apart from that, speed aggressively tries to make use of static information in several ways:
- special cased range loops for the common cases (step = +/- 1)
- static optimizations over the operations structure to avoid work where possible
- a final run of constant-folding tries to gather static information
Note: function arguments to map
, filter
, etc. are assumed to be pure and to execute no side-effects
to be able to reorder operations for optimizations. You cannot rely on side-effects in these functions
as they may not be executed for speed optimized code if the result of the function call isn't needed
for the final result.
Features
Supported collection types
Range
Array
IndexedSeq
List
Supported operations
speed currently supports a subset of interesting and important collection operations from the Scala collections.
Non-Terminal
See the NonTerminalOps type:
- flatMap
- map
- filter
- withFilter
- reverse
- take
- (more to come)
Terminal
See the TerminalOps type:
- foldLeft
- foreach
- reduce
- sum
- min
- max
- count
- size
- mkString
- to[Coll[_]]
- forall
- (more to come)
Optimizations
speed already employs some optimizations on the operations structure like these:
- try to push selection of elements down to the generator (like
take
orreverse
) - implement array operations in terms of range
- generate three different kind of basic loops for ranges depending on static information
about
start
,end
,step
andisInclusive
if available
Planned
- for-comprehension pattern extraction support (avoid tuple generation)
- replace well-known
Numeric
andOrdering
instances by primitive operations
How?
In the words of the master of enrichment:
Debugging Tools
- Wrap an expression with
speed.impl.Debug.show
to show the generated code
Known issues
It is known that speed still optimizes side-effects in user code too aggressively away in some cases. If you encounter such an issue please report it on the issue tracker.
Related work
Benchmarks
Runtime (smaller is better, reference is an equivalent while loop, "Scala" uses the same expression with views)
Description | While | Speedy | Scala |
---|---|---|---|
foreach counting | 100 % | 102.89 % | 430.06 % |
nested counting | 100 % | 98.63 % | 389.60 % |
filtered summing | 100 % | 99.40 % | 861.21 % |
mapped summing | 100 % | 100.15 % | 3985.92 % |
array foreach counting | 100 % | 99.24 % | 450.01 % |
array summing | 100 % | 100.23 % | 625.30 % |
array filtered summing | 100 % | 103.45 % | 1137.95 % |
array mapped summing | 100 % | 101.55 % | 2101.77 % |
size of filtered ref array | 100 % | 98.30 % | 929.55 % |
array map - filter - to | 100 % | 95.01 % | 291.28 % |
list foreach counting | 100 % | 100.07 % | 100.87 % |
list summing | 100 % | 104.40 % | 202.86 % |
list filtered summing | 100 % | 104.42 % | 627.96 % |
list mapped summing | 100 % | 100.08 % | 769.58 % |
nested list summing | 100 % | 99.52 % | 1133.47 % |
(Don't be fooled, values < 100 % are most likely not significant but I'm including them here
anyways just for the giggles.
(Don't be fooled2, I obviously selected benchmarks in the favor of speed, if you'd like to see other comparisons please PR.)
Benchmark results can be regenerated by running the PerformanceSpecs
in the code base.
Foreach counting
var counter = 0
for (i ← 1 to 1000) counter += i * i
counter
Nested counting
var counter = 0
for {
i ← 1 to 100
j ← 1 to 100
} counter += i * j
counter
Filtered summing
(1 to 1000).filter(_ % 3 == 0).sum
Mapped summing
(1 to 1000).map(i ⇒ i * i).sum
Array foreach counting
var counter = 0
for (x ← array) counter += x * x
counter
Array summing
array.sum
Array filtered summing
array.filter(_ % 3 == 0).sum
Array mapped summing
array.map(x ⇒ x * x).sum
Size of Filtered ref Array
class Ref(var num: Int = 0)
val N = 1000
val refs = (0 until N).map(i ⇒ new Ref(i)).toArray
refs
.filter(_.num % 5 == 0)
.filter(_.num % 7 == 0)
.size
Array Map - Filter - To
array.map(x ⇒ x * x).filter(_ % 3 == 0).to[List]
List foreach counting
var counter = 0
for (x ← list) counter += x * x
counter
List summing
list.sum
List filtered summing
list.filter(_ % 3 == 0).sum
List mapped summing
list.map(x ⇒ x * x).sum
Nested list summing
(for (x ← list1; y ← list2) yield x * y).sum