« Unintended Consequences | Main | Singularity Skeptic »
Thursday
Sep112008

Raw and Escaped String Literals

Start up the Glasgow Haskell Interpreter and try to display a typical Windows file path:

Prelude> putStrLn "C:\test\x41.txt"
C:      estA.txt


The result has been mangled.  The problem is that, by default, Haskell interprets escape codes in string literals.  The '\t' has been interpreted as a TAB character and the '\x41' has been interpreted as the ASCII code for 'A'.  This interpretation process can even give compile time errors:

Prelude> putStrLn "C:\work\x41.txt"
<interactive>:1:14: lexical error in string/character literal at character 'o'


In Haskell there is only one way to stop your strings from being mangled, and that is to mangle them yourself by inserting extra escape characters to prevent the backslash characters from being interpreted:

Prelude> putStrLn "C:\\test\\x41.txt"
C:\test\x41.txt
Prelude> putStrLn "C:\\work\\x41.txt"
C:\work\x41.txt


This is all rather messy and unsatisfactory.  There must be a better way.


In Python strings suffer from similar problems:

>>> print "C:\test\x41.txt"
C:      estA.txt


Python is not as strict as Haskell about unrecognized escape codes:

>>> print "C:\work\x41.txt"
C:\workA.txt


But in Python you can prevent the interpretation of escape characters by putting an 'r', for 'raw string', immediately in front of the string:

>>> print r"C:\test\x41.txt"
C:\test\x41.txt


Unfortunately, the fact that the underlying implementation is escaped strings still leaks out in that you can't have a backslash as the last character in a raw string:

>>> print r"C:\test\"
 Line 1
   print r"C:\test\"
                   ^
SyntaxError: EOL while scanning single-quoted string


Python guru Fredrik Lundh gives more explanation of this problem and how to work round it here.

However, there would be no need for all these explanations and work-arounds if raw, uninterpreted strings were the default and you had to indicate explicitly when a string was to have its escape characters interpreted.  In Haskell the latter could be implemented as a function esc :: String -> String.  Then the simple unadorned string would do what was expected:

Prelude> putStrLn "C:\test\x41.txt"
C:\test\x41.txt


And to deal with those special cases where you really need escape characters you would explicitly call the function to do it:

Prelude> putStrLn (esc "She said \"Hi!\"")
She said "Hi!"


Explicit is better than implicit, especially when it leads to less confusion.

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
All HTML will be escaped. Hyperlinks will be created for URLs automatically.