Desugaring Scala
Alex Westphal · 19 Oct 2014Many newcomers to Scala find the syntax to be bewildering. In general this comes down to not understanding the vast amount of syntactic sugar provided by Scala. In this article we explore and de-sugar Scala’s complex syntax and how it relates to Java.
Method Invocation
At it’s core, Scala has the same method invocation syntax as Java. That is, the invocation target followed by .
then
method name followed by (
then the comma separated argument list followed by )
. Described symbolically:
<target>.<method>([<arg0>, [<arg1>, ...]])
Concrete examples of this invocation syntax (valid in both Java and Scala):
str.hashCode() // No arguments
str.concat(str2) // One Argument
str.substring(2,4) // Multiple Arguments
str.concat(str2).concat(str3) // Chained Invocation
There are two ways in which basic method invocation in Scala differs from Java. The first is that in the no argument
case the parenthesis can be left off yielding str.hashCode
which looks like a field access. Note: If the method was
defined without parenthesis (eg def hashCode: Int = ...
), this form must be used.
The second way in which Scala differs is that it allows invocation of single argument methods without the .
or
parenthesis at all. In a lot of cases this can help to reduce clutter and makes the code more readable. The following
example is obviously clearer than the previous one:
str concat str2 // One Argument
str concat str2 concat str3 // Chained Invocation
Symbolic Operators
For library writers, one extremely useful feature of Scala is the ability to define symbolic operators for custom types. Symbolic operators are realised through allowing symbols as class, field, and method names. Eg:
class Matrix {
// Defined exactly as with any other method
def +(m: Matrix): Matrix = ...
}
m1.+(m2) // 'Java style' invocation
m1 + m2 // Typical usage
m1 + m2 + m3
For some this may be reminiscent of Java’s String concatenation operator +
where "ab" + "cd"
is syntactic sugar for
"ab".concat("cd")
. The difference is that in Java, it is a compiler special case while in Scala it can be implemented
in user code. In fact Scala provides identical behaviour in Predef.StringAdd.
Some refer to the concept of symbolic operators using the term “operator overloading” which has rather negative
connotations from it’s misuse in C++. Scala’s difference is that they are implemented as methods and as such don’t allow
many of the dangerous forms of overloading that C++ provides. For example Scala has no mechanism to overload =
because there is no safe and meaningful way to do so. Scala also encourages certain conventions around symbolic
operators specifically that they should only be used where the symbol is already understood in the domain (eg +
for
matrix addition, and ++
for concatenation). Despite the conventions there are some that push the conventions, such as
Predef using +
for String concatenation and Scalaz using @\?/
for disjunction validation.
The Scala compiler is built to target JVM bytecode, so when compiling has to follow much the same rules as Java. One of
these rules is that the primitive symbols are not allowed as method, class or field names. To get around this the
compiler uses the names of the operator instead of the symbol (eg +
becomes $plus
and +=
becomes $plus$eq
).
This has the benefit that symbolic operators can be called from Java:
m1 + m2 // Scala Usage
m1.$plus(m2) // Java Usage
In most cases the aforementioned binary operators are excellent but some domains need unary operators for certain things.
Unary operators come in two forms prefix (eg -x
) and postfix (eg e?
). Prefix operators are defined as special unary
methods (eg def unary_- = ...
) and retricted to only a couple of operators (+
, -
, !
). Postfix operators are
effectively just parameter-less symbolic methods (eg def ? = ...
).
Operator Associativity and Precedence
An area of Scala’s syntax which can often trip people up is that of operator associativity and precedence. In general the rules (when used without dots and parenthesis) are similar to Java and most other C family languages, that is operators are left associative and precedence is what you learned in secondary school algebra. Scala’s exact precedence rules as as follows:
First Char | Precedence |
---|---|
(any letter) | 1 |
| | 2 |
^ | 3 |
& | 4 |
< > | 5 |
= ! | 6 |
: | 7 |
+ - | 8 |
* / % | 9 |
(other symbols) | 10 |
There is one exception to the precedence rules with regard to so called ‘assignment’ operators, that is operators that
end with =
(+=
, /=
, etc). ‘Assignment’ operators always have the lowest possible precedence regardless of the
other rules. For example 3 + 4 == 1 + 2 * 3
is equivalent to (3 + 4) == (1 + (2 * 3))
.
By far the largest confusion comes with right associative operators, that is operators that end with :
(eg ::
+:
,
etc). Unlike any other operator they associate and bind to the right. Thus x :: xs
is equivalent to xs.::(x)
and
x :: y :: xs
is equivalent to x :: (y :: xs)
.
Assignment Operator
‘Assignment’ operators (operators that end with =
) can provide two different kinds of operation. The first is an
update operation, that utilises the functionality described in the Symbolic Operators section. This is achieved by
explicitly declaring the desired operator. For example scala.collection.mutable.ListBuffer
has an ‘append to’ method
defined (approximately) as:
def ++=(xs Traversable[A]): ListBuffer = ...
The behaviour of this particular operator is to update the ListBuffer
by appending the elements of the Traversable
.
Usage looks like:
import collection.mutable.ListBuffer
val xs = ListBuffer(1,2) // xs is now [1,2]
xs ++= ListBuffer(3,4) // xs is now [1,2,3,4]
The other kind of ‘assignment’ operator is the re-assignment operator. It works only if there isn’t an exact matching
assignment operator but the associated operator is defined and the target is a var
rather than a val
. That is, for
the expression xs ++= ys
, it is a re-assignment operator if and only if ++=
is not defined on xs
, ‘++’ is
defined on xs
and xs
was defined using var
. The exact compiler behaviour is a translation that should be
recognisable to C family programmers. This allows for the apparent updating of immutable types:
var xs = List(1,2) // xs is now [1,2]
xs ++= List(3,4) // xs is now [1,2,3,4]
// Translates to
xs = xs ++ List(3,4)
Function Application
Until Java8 the JVM didn’t have a mechanism for native functions, thus to simulate a function a class is defined that is a subtype of one of the Scala Function traits. (You don’t actually have to do this explicitly because of the Functional Literal syntax described later, and other compiler magic). An explicit example:
// Function Definition
val f = new Function1[Int,Int] {
def apply(x: Int): Int = 2 * x
}
// Function application
f.apply(2) // or
f(2)
The last example above is the actual ‘Function Application’ syntactic sugar is exactly equivalent to the preceding line. This particular syntax is used to for accessing collections, due to the fact that Iterables and Maps can be consider partial functions from indicies to values and keys to values respectively. For example:
val list = List(1,2,3,4)
val map = Map("a" -> 1, "b" -> 2, "c" -> 3)
list(2) // Returns 3
map("b") // Return 2
Updates
For mutable data structure, Scala provides a means to update them in a manner familiar to Java or C family programmers.
This is provided via a special update
method that serves as a dual to the apply
method of mutable collections.
array(2) = "hello" // Syntactic Sugar
array.update(2, "hello") // Expanded Form
Type Sugar
Scala provides a useful sugar for function types that help make them feel more natural. Function1[T1, R]
(the one
argument function) can otherwise be written as T1 => A
, and similarly Function2[T1, T2, R]
(the two argument
function) can be written as (T1, T2) => A
. This pattern is available for all function types up to Function23
.
Much the same as for the function types, the tuple types have a useful and natural feeling sugar. Tuple1[T1]
can be
written as (T1)
and similarly Tuple2[T1,T2]
can be written as (T1,T2)
. This pattern is available for all tuple
types up to Tuple23
.
In addition to the syntactic sugar for function and tuple types, there is a special rule regarding types that allows
what at first glaces would appear to be syntactic sugar but is in fact just cleverly named type. The rule is that any
type with exactly two type parameters can be written using infix rather than prefix notation. This allow types such as
Tuple2[A,B]
to be written as A Tuple2 B
. While doing so with a alphabetic type tends to be weird, this form really
shines when using symbolic class names. Scalaz makes extensive use of this feature, providing types such as \/
(disjunction), @@
(type tagging), and <~<
(Liskov substitutability).
Function Literals
Scala is considered a functional language and as such provides a fair bit of syntactic sugar relating to functions that improves significantly over the old Java style of defining a class for each function (up to Scala 2.12, this is still what happens under the hood). A simple function declared using the sytactic sugar looks like:
val f = (x: Int) => x + 1 // Type inferred as (Int) => Int
val g = (x: Int, y: Int) => x * y // Type inferred as (Int,Int) => Int
The type of the argument can be left off if it can be inferred from the context:
def foo(f: Int => Int) = ...
def bar(f: (Int, Int) => Int) = ...
foo(x => x + 1) // Compile knows type should be (Int) => Int
bar(x,y => x * y) // Compile knows type should be (Int,Int) => Int
In case such as that above, the parameter(s) is/are used exactly once each. This is actually a fairly common situation
so Scala allows the argument list to be dropped entirely and the usage of each argument to be replaced with _
. Hence
the previous example reduces to:
foo(_ + 1)
bar(_ * _)
(Not quite as succinct as Haskell’s foo (1+)
and bar (*)
but close)
Fully expanding all syntactic sugar the multiply function passed to bar becomes:
// Define the function type as an inner class
class $anonfun$1 extends Function2[Int, Int, Int] {
def apply($a0: Int, $a1: Int): Int = $a0 * $a1
}
// Use the function
bar(new $anonfun$1)
Another case which at first glace appears special but is actually just the trivial application of the ability to leave off a parameter list is that of code blocks. A code block is essentially just a no argument function and combined with Scala’s method invocation rules allows the defining of things that look like new control structures:
def doStuff(f: => Unit) = ...
doStuff {
// Do stuff
}
Tuple Literals
Tuple literals are probably the simplest form of syntactic sugar, yet they are still extremely useful. For a 3-tuple
such as (Int, String, Int)
we can create of a value of that type as (2, "abc", 5)
, which minus the sugar translates
to new Tuple[Int, String, Int](2, "abc", 5)
.
Extractors
No modern functional programming language would be complete without pattern matching and Scala is no exception. Scala’s
pattern matching is provided using a special case for tuples and syntactic sugar around the unapply
and unapplySeq
methods. The special case in relation to tuples is that the values in a tuple can be extracted with convenient syntax:
val (a,b,c) = (2, "abc", 5)
Scala’s general pattern matching is implemented using the concept of extractors. That is objects that have an
appropriately typed unapply
and/or unapplySeq
method(s). For example a simple extractor that matches multiples of
ten and returns the multiple:
object Ten {
def unapply(x: Int): Option[Int] =
if(0 == x % 10) Some(x/10)
else None
}
val Ten(x) = 30 // x is set to 3
val Ten(y) = 32 // throws scala.MatchError
If more than one values is to be extracted, then the return type of unapply
is an appropriately size tuple wrapped in
an option (eg Option[(Int, Int)]
). A useful special case exists for extracts that produce exactly two values. For this
case, the extractor can be written using either prefix or infix notation, leading to a variety of useful extracts. For
example the List extractor ::
is approximately defined and used as follows:
// Defining the :: extractor
object :: {
def unapply[A](list: List[A]): Option[(A, List[A])] =
if(list.isEmpty) None else Some((list.head, list.tail))
}
val ::(x, xs) = List(1,2,3) // x is 1, xs = List(2,3)
val y :: ys = List(4,5,6) // y is 4, ys = List(5,6)
It should be obvious that unapply
is sort of the reverse of apply
, but if it’s not clear here’s a better example:
val List(x,y,z) = List(1,2,3)
// is roughly equivalent to
val (x,y,z) = List.unapply(List.apply(1,2,3)).get
(The List
extractor is actually defined using unapplySeq
rather that unapply
which I will cover in more detail in
a future post)
Case Classes
One common feature of modern functional programming languages is algebraic data types. Rather than have a special syntax
like many other languages, Scala provides them through the syntactic sugar detailed in this article. Even through you
can implement algebraic data types by hand, the authors of Scala realised that the implementation often mostly the same
so provided case classes. Just by putting case
in front of a class declaration we get a whole bunch of useful methods
for free. For example:
case class Colour(r: Byte, g: Byte, b: Byte)
provides the equivalent of:
class Colour(val r: Byte, val g: Byte, val b: Byte) extends Product {
def canEqual(that: Any): Boolean
def equals(that: Any): Boolean
def hashCode: Int
def productArity: Int
def productElement(n: Int): Any
}
object Colour {
def apply(r: Byte, g: Byte, b: Byte): Colour
def unapply(c: Colour): Option[(Byte, Byte, Byte)]
}
Context Bounds
The most recent feature in this post, Context bounds are a feature that was added in 2.8 to support use of type classes. As with everything else in this post, they are simply syntactic sugar. Under the covers they are just an unnamed implicit parameter. For example to be able to sort a list we require the elements to have some kind of implicit ordering. Therefore we can define the signature of a List sort as:
def sort[A: Ordering](xs: List[A]): List[A]
// which expands to
def sort[A](xs: List[A])(implicit $evidence0: Ordering[A]): List[A]
Final Words
This article has detailed twelve different kinds of syntactic sugar in a reasonable amount of detail. If you look at these example closely you’ll notice that underneath all the sweetness, Scala is actually fairly Java-like. What this means is we get Java compatibility while still having nice things.