Monday 29 April 2013

Secrets of the Scala Lexer 2: Blank Lines in Comments

Consider the following Scala snippet:
@tailrec
/*
 Comment 1
 Comment 2
 */
def foo: Int = ???
This compiles correctly. Consider what happens if we insert a blank line between Comment 1 and 2:
@tailrec
/*
 Comment 1

 Comment 2
*/
def foo: Int = ???
This time, we get a compile error ("expected start of definition"). So why is it that we can get syntax errors based solely on whether there is a blank line inside a multi-line comment?

The gory details can be found in the Scala Language Specification §1.2, but the summary is:
  • To support semicolon inference, new-line characters are sometimes interpreted as a special new line token called "nl". The rules for when this occurs are moderately complex, but end up working quite intuitively in practice.
  • Two new line tokens can be inserted by the compiler in the following case: "if two tokens are separated by at least one completely blank line (i.e a line which contains no printable characters), then two nl tokens are inserted."
  • At certain places in the syntax, an optional single newline token is accepted -- this includes after an annotation. This is also done to support semicolon inference.
  • However, two new line tokens are not permitted in some places (including after an annotation). I believe the intention is that blank line is a clear sign that the code after should be separated from the code before.
So by adding a completely blank line in the comment, two new line tokens are inserted instead of one, as per the above rule, and that is not permitted by the syntax after an annotation.

Maybe this behaviour should be changed to ignore completely blank lines inside comments?

Updated (1/May/2013): I've raised this as issue SI-7434.

Other posts in this series:

1 comment: