An Evil Scheme

In the winter of 2013, I took a course at UMass Boston called CS 450: The Structure of Higher Level Languages. I had no choice; it is a required course for getting a CS degree. I hated, hated, hated that course. And I am hardly alone; of all my classmates that I talked to, not one liked it. The best one could say is that others hated it less than me.

The main reason for my hatred is because it is a Scheme course. I know that there are many programmers who love Scheme, but I have no idea why. I also know that most programmers have never written a single line of Scheme code in their entire lives. But quite a few of them will be forced to eventually; UMass is not the only university that has a Scheme requirement.

If you’re one of those programmers, then this article is for you. By reading this article, you will hopefully be a little more prepared for your inevitable pain. On the flip side, if you read this article and say, “Hey, that actually sounds kinda cool” – well, then, Scheme is the language for you. Good for you; you’ll save lots of money on therapy.

So, without further ado, here is my introduction to the Scheme programming language, with all the negative bias you could ever want. To preserve the appearance of objectivity, I’ve occasionally included some rebuttals from a Scheme advocate, who is naturally fictitious.

Scheme is not used outside of academia

There are exactly two reasons to learn Scheme:
1. You are a student taking a required Scheme course
2. You are professor teaching a required Scheme course

There is absolutely no other reason to learn Scheme. To my knowledge, not one non-academic application has been written in Scheme. A quick glance at the TIOBE Software Index shows that Scheme is hardly used anywhere; it is even more unpopular than Haskell, Ada, Fortran, or COBOL. If you’re looking for a language that will help you in the real world, this ain’t it.

Scheme is a dialect of the Lisp programming language, which is slightly more popular – but not much. I know of two web applications that were written in Common Lisp… and then re-written from scratch in other languages. There is also Emacs Lisp, which is useful if you really love Emacs (which I do not). But if that’s what you’re after, just learn whichever Lisp dialect you need and skip Scheme altogether.

The fact that it’s not used outside of academia means there is almost no support for Scheme in any widely-used IDE. When you take the Scheme course, you’ll end up using the Racket IDE. It’s a servicable IDE, but it’s really limited – it has no code folding, autocomplete, code generation, and so on. It does not handle any language other than Scheme and its dialects, so once you pass your course, you’ll end up uninstalling it.

Rebuttal: There’s a reason Scheme is used in academia: Scheme will teach you concepts that you can apply to other languages.

No, it won’t, because there is not a single commonly used language that works like Scheme. In reality, what you mean is that SICP that will teach you how to do things in other languages. Which brings me to my next point.

Scheme courses uses SICP

“SICP” stands for Structure and Interpretation of Computer Programs, by Hal Abelson, Jerry Sussman, and Julie Sussman. It is commonly called “the wizard book” because of the wizard on the cover. The entire book is released under a CC-BY-NC license, and you can read the full text of the book at its website. You can also download ePub or PDF copies at this SICP WordPress site.

But, even though it’s free, you’ll likely pay for a copy. This is because the problem sets are not all in one place, so you’ll need the page numbers of the hard copy to understand the professor’s homework assignments. Besides, you’ll be paying for it with your financial aid money, so what the hell.

The book was last updated in 1996, which, in computer years, is roughly when dinosaurs ruled the Earth. It is still used because it deals with a lot of fairly useful subjects: environments, delayed evaluation, message dispatching, machine language, garbage collection, etc. Here’s the problem: you’ll only learn how to do these things in Scheme. Most of what you learn cannot be ported to other languages. Take, for example, the section on their version of “assembly language.” If you already know assembly language, you’ll just be confused; and if you do well with SICP, you’ll be totally lost when you actually try to program assembly language. (This was my experience: I had already used assembly, both the Intel and AT&T versions, and that section of the book was still confusing.) I’ll have more to say about this.

This is not the only problem with the book. None of their code is commented, which wouldn’t be so bad, except that modifying their code will be about 90% of your assignments. You will not write a single Scheme program from start to finish. To make things worse, no solutions are available for the problems given in the book – not even to professors. (At least, my professor had to write his own.) This means that when you fail at one of your problems – and you will – you’ll never know how to solve it.

The book’s authors are MIT faculty, and a somewhat alarming factoid is that the course that is based on the book used to be the introductory CS course at MIT. Personally, I think this is the reason for MIT’s notoriously high suicide rates. But even MIT doesn’t offer that course any more; they switched to Python.

Scheme is dynamically typed

When a programming language is dynamically typed, it means the type of its function parameters are determined at runtime, not at compile time. Is a parameter supposed to accept a string? A number? A list? With dynamic typed languages, there is no way to know. Or, actually, there is only one way to know – your program will crash. In Racket, this is called a “contract violation.”

Of course, Scheme is not the only dynamically typed language. You know all those annoying is_array checks you have to do in PHP? Get used to them.

This gets even worse when dealing with “compound types,” which in Scheme means “tagged lists.” I’ll explain this a little later in the article.

Scheme is a purely functional language (except when it’s not)

This one fact is the source of most PITA’s that you must put up with when programming in Scheme.

Pure functions are functions that accept parameters, and return a value that is solely determined by those parameters, and nothing else. Their return values are not determined by anything other than their parameters (like user input or global state); they do not alter the value of anything passed as parameters; and they do not alter anything residing in memory that is outside their scope (like system output or global variables). All these things are called side effects, and if a function has any, it is an impure function. A purely functional programming language is a language that does nothing but call pure functions.

This is in contrast with procedures, which are called precisely for their side effects. Hence the rule of thumb in C and C++ that procedures are “void functions.” While it’s a good rule of thumb, it glosses over the reason for the distinction: if a function returns void, it must have some sort of side effect, or else there’s no reason to call it (or to write it in the first place).

Unfortunately, Scheme uses the term “procedure” for functions. In Scheme, all “procedures” must return a value; if it is called for its side effects, its return type is “unspecified.” I am not going to fall into that trap, and will use the term “function,” whether the function is pure or impure.

Of course, most programming languages do not have only pure functions, and Scheme is no exception. The primary exception in Scheme is define, which has the side effect of altering the environment to associate an identifier with a constant or a function. (Scheme, unlike Lisp, makes no distinction between defining constants and defining functions.) Unlike other languages, Scheme allows functions to be defined inside the body of other functions, and in fact this is very common in Scheme.

One other thing to consider is that in Scheme, functions are first-class objects. They can be passed as parameters to other functions, and returned as values from other functions. A very common way to do this is through the use of lambda functions. Lambda functions are functions that do not have an identifier associated with them. For those who paid attention in C class, lambdas could be considered similar to function pointers. If you’ve ever used anonymous event listeners in Java, you have used something similar to a lambda. When you define a function in Scheme, what you are actually doing is associating an identifier with a lambda function.

The fact that Scheme attempts to be purely functional has one huge disadvantage to programmers: Scheme does not have “steps.” The value of the last expression in a function’s body is the return value of that function. There is no way to return from the middle of the body of a function. And all expressions in a function, other than the last one, are completely ignored. (For the most part, those other expressions are impure functions that are called for their side effects.)

This makes it extraordinarily difficult to implement algorithms in Scheme code. Most algorithms are like cooking recipes: a series of steps that perform a task. Since Scheme does not have steps, there’s no real order of execution. You need to make each of those steps a function call, then pass the results of that function call as a parameter to a function representing the next step.

Let’s see an example of some abstract algorithm and its implementation in a purely functional language. Here’s our algorithm:

Task: Do something with A and B.
Step 1: Do operation 1 on A.
Step 2: Do operation 2 on B.
Step 3: Take the results of Steps 1 and 2, and do operation 3 with them.
Step 4: Do operation 4 on the result from step 3, and return it.
Done.

Let’s say our programming language has a C-style function syntax. Here’s how you implement that algorithm in our language:

return operation4(operation3(operation1(A), operation2(B))));

That’s much easier to understand, right?

Which brings me to my next point.

Scheme’s syntax is almost unreadable

This is mainly because Scheme is a dialect of Lisp, whose syntax is based on Cambridge Polish notation.

Take a look at the C-style code above. What would you do to make it more readable? If you were the creators of Lisp, you would answer “what that code really needs is more parentheses.”

In Lisp, every expression is enclosed in parentheses. Hence, Lisp’s alternate full form, “Lots of Irritating Silly Parentheses.” The long string of closing parentheses makes it nearly impossible to spot mistakes with the naked eye. And misplacing one closing parentheses (out of, say, twelve) can be a syntax error, result in a stack overflow, or make the program mysteriously misbehave at runtime.

Here’s how coding in Lisp usually goes:
1. Try to write some Lisp code
2. Oops, there are too many/too few/out-of-place parentheses
3. Add or remove a closing parentheses at some random line of code
4. Go to step 2

Like Polish notation, Lisp and Scheme also use prefix notation. There is no infix notation in either Lisp or Scheme. This makes sense if you think of “operations” (plus, minus, and so forth) as functions; the first identifier after the parentheses is the function name. So, for example, this C code:

x = 2 * 3 + 4 / 5;

…would be written in Lisp and Scheme as:

(define x (+ (* 2 3) (/ 4 5)))

Scheme does not have iteration, only recursion

This is another by-product of attempting to be a purely functional language. If the only thing you can do is call pure functions, then there are no control structures such as for, while, or do while. Syntactically, iteration is impossible, and you have to use recursion. For that reason, the Scheme specification states that all Scheme implementations must implement optimized tail recursion (such that nothing is added to the call stack).

This makes iteration much harder to implement. Compare this C++ code:

int factorial(int x)
{
    int f = 1;
    for (int i = 1; i <= x; i++)
        f *= i;
    return f;
}

…with the equivalent Scheme code:

(define (factorial x)
  (define (fact-iter i f)
    (if (equal? (+ x 1) i)
        f
        (fact-iter (+ i 1) (* i f)))
  (fact-iter 1 1)))

Yeah, it’s always better to teach CS students to prefer recursion over iteration. Nothing could possibly go wrong, right?

By the way: the Scheme code above has a typo. Can you spot it? No? Didn’t think so.

Rebuttal: Scheme does too have iteration. The R5RS spec has “named let” and do.

You will not be allowed to use either in any Scheme course. I certainly wasn’t allowed to use either in mine. Teaching students to think only in terms of recursion is, apparently, part of the point.

Scheme does not have local variables

Scheme, being an attempt at a purely functional language, discourages altering the contents of memory. And that’s what a variable is: an identifier for a mutable memory location. So Scheme doesn’t have any.

Well, that’s not quite true. As we saw before, define can associate an identifier with a memory location. But define is meant to define constant identifiers; once defined, an identifier cannot be re-defined.

There are special forms that will set the contents of already-defined memory, but they are discouraged. All of those commands are suffixed by an exclamation point, like you’re yelling at a disobedient dog. (In the case of define, that special form is set!.) You probably won't even be told about them in the first month of the course.

The closest thing you’ll get to a “variable” is a function parameter. Of course, in purely functional languages, function arguments cannot be altered, so function parameters aren’t supposed to be modified either.

Rebuttal: Now you’re just making stuff up. Scheme has the let special form, and it’s hardly discouraged.

This is true, until you realize that let is just syntactic sugar for a lambda function. So, the “variables” you’re creating are actually the parameters of that lambda function. This means that you can’t reference one “variable” with another within the same scope, just as you can’t define one parameter in terms of another in the parameter list of C or Java functions.

Let’s see an example. Here’s some C code:

int x = 5;
int y = x;
...do something with x and y;

You simply can’t do this in Scheme. Instead, you need to use nested let forms. Here’s the equivalent Scheme code:

(let ((x 5))
  (let ((y x))
    (...do something with x and y)))

Here’s that same Scheme code, without let‘s syntactic sugar:

((lambda (x)
  ((lambda (y)
     (...do something with x and y))
   x))
  5)

What’s happening is that you’re creating a lambda function with parameter x, and calling it with the literal value 5. Within the body of that lambda function, you’re creating another lambda function with parameter y, and calling it with the value of x (the outer lambda’s parameter).

And, because these are parameters, they can’t be assigned values. You could, of course, use another let special form with the same “variable” name, and this would shadow the value of the outer let “variable.” (You would, of course, incur all of the overhead of creating another environment for the additional lambda function.) Or you could use the set! special form… which, if the variable isn’t a primitive, would alter the “variable’s” value outside of the function.

Which is to say, it’s all a huge mess.

Scheme uses only one compound data type

…and that type is a pair. A pair is simply a type that holds two values; since Scheme is dynamically typed, those two values can be anything – including other pairs. The technical term for a pair is an S-expression. A pair is created using the cons special form, which associates two values together. The first of those values is accessed using car, and the second of those values is accessed using cdr. No, the terms don’t make any particular sense; they are named that way for legacy reasons. (If you’re desperately curious, you can read the explanation on Wikipedia’s page on CAR and CDR.)

Because pairs can reference other pairs, the terms car and cdr have shortened forms with different numbers of “a” and “d” in them. For example, (cadr x) means “the car of the cdr of x.” Remember that Scheme is dynamically typed, so there’s no particular guarantee that the cdr is a pair at all; if it’s not, cadr will result in a “contract violation.” Yes, it’s as bad as it sounds.

Rebuttal: Uh, say, remember those things called lists?…

Yes, but I don’t count them as a separate compound data type – because they’re not.

Lists in Scheme (like in Lisp) are defined recursively. A list is “a pair whose cdr is a list.” Because lists can’t go on forever, the cdr of the last element in a list is a literal called the empty list. In Scheme, the empty list is signified by '(). (In Common Lisp, it’s nil, which also represents Boolean false.) The empty list is essentially a “null pointer,” so it’s appropriate that you test for the empty list with Scheme’s null? function.

So, in Scheme, lists are just syntactic sugar for a “chain” of pairs. The following two Scheme expressions are exactly equivalent:

(list a b c d)
(cons a (cons b (cons c (cons d '()))))

Of course, in the above code, a, b, c, and d can also be lists… or pairs, or “atomic” values (integers, Booleans, symbols, etc). You can have lists of lists, and lists of lists of lists, ad infinitum. Not that you’ll be able to tell, of course, since Scheme is dynamically typed.

Naturally, lists are traversed recursively. You define some function that takes a list as a parameter, do something with the car of that list, then recursively call that same function with the cdr of the list. Of course, since the list is passed as a parameter, you are not allowed to change the car of that list. Instead, you will be re-building the list on each recursive call, eventually returning a cons of the modified car value with the cdr of the parameter.

For example, here’s the code to multiply everything in a list by two:

(define (mult-by-2 list-param)
    (if (null? list-param)
        '()
        (cons (* 2 (car list-param))
              (mult-by-2 (cdr list-param)))))

Anyone who has taken an elementary data structures course will recognize Scheme’s list as a singly-linked list. And that’s the only compound data types you’ll use in Scheme. There are no classes, structs, enums, or arrays. Remember all of those algorithms that require an array’s O(1) access time to be efficient? Well, in Scheme, you’re screwed.

Rebuttal: That’s not true; Scheme has a vector type.

And, like Scheme’s do form, you will not be allowed to use them in the course. In all likelihood, you won’t even be told about Scheme vectors until the last couple weeks of the course… if you’re told about them at all. SICP doesn’t even mention them outside of the section on garbage collection, near the end of the book, and we didn’t study that section in our course.

Scheme is ill-fitted to study advanced programming concepts

If you want to study things like objects, message dispatching, environments, parsing, or compilation, Scheme is the wrong language for you, for all the reasons I just talked about.

But SICP does actually study these things using Scheme. The fact that it talks about these concepts is the very reason the book is still worth reading. So, how do they do it?

The answer is that they shoehorn all of these programming concepts into Scheme’s limited scope, despite the fact that it’s an ill fit. Rather than actually do any of the things I mentioned, they fake it using Scheme syntax.

For custom data types, SIPC uses “tagged lists.” A tagged list is a list whose car is a symbolic literal (a “tag”), and whose cdr is some kind of “object” (that is, a list, or list of lists…). In order to tell whether a list is a certain “type,” you check to see whether that list’s car matches that tag.

For example, here’s Scheme code that handles the “types” foo and bar:

(define (make-foo x)
  (list 'foo x))
(define (is-foo? x)
  (if (equal? (car x) 'foo) #t #f))
(define (make-bar x)
  (list bar x))
(define (is-bar? x)
  (equal? ((car x) bar ) #t #f))
(define (fubar x)
  (cond ((is-foo? x)
         (...do something with foo type))
        ((is-bar? x)
         (...do something with bar type))))

For “objects,” SICP uses lists, closures, and functions that return functions. “Invoking a method” is actually calling a function that is returned from a dispatch on a tag, the tag being the “method name.”

Here’s an example of an “object” that creates a bank account. This example is taken verbatim from SICP.

(define (make-account balance)
  (define (withdraw amount)
    (if (>= balance amount)
        (begin (set! balance (- balance amount))
               balance)
        "Insufficient funds"))
  (define (deposit amount)
    (set! balance (+ balance amount))
    balance)
  (define (dispatch m)
    (cond ((eq? m 'withdraw) withdraw)
          ((eq? m 'deposit) deposit)
          (else (error "Unknown request -- MAKE-ACCOUNT"
                       m))))
  dispatch)
; Create a bank account "object":
(define my-account (make-account 100))
; "Invoking" the "methods" of that bank account "object:"
((my-account 'withdraw) 50)   ; returns 50
((my-account 'deposit) 40)    ; returns 90
((my-account 'withdraw) 100)  ; returns "Insufficient funds"

As data types get more complicated (lists whose car is list…), keeping track of what’s actually in them becomes near-impossible. To make things managable, you need to define additional functions to handle the sub-lists of each data type. This means more opportunities for “contract violations” (runtime type errors). It also means that there’s a proliferation of functions, all of which interact in somewhat unpredictable ways, located in different parts of the program. The programming term for this is spaghetti code.

SICP uses similar fake-outs for other forms of data: environments, tables, assembly language commands, “pointers,” and so forth. In other words, it’s all horrifically complicated and unintuitive.

Rebuttal: These things are complicated in SICP because they’re complicated subjects. Learning them in Scheme will help you understand how other languages actually implement things like objects or environments.

This assumes that other languages implement these things in the same way that SICP does. They don’t.

Take SICP’s “objects,” for example. How many languages implement objects as tagged linked lists? None that I know of. Futhermore, SICP will not help you understand anything having to do with inheritance, type conversion, virtual methods, and so forth. If anything, SICP will simply confuse students who don’t already know about these subjects.

Besides, most people won’t learn much about these subjects. They’ll be way too busy trying to figure out Scheme’s syntax, tracking down “contract violations” and other runtime errors, figuring out why recursive functions don’t work, or tracing the program’s execution through dozens of uncommented functions scattered throughout the code.

If that all sounds fun to you, then you’ll really enjoy Scheme. Personally, I’ll be only too happy to uninstall Racket, delete my project files, and never, ever look at Scheme again in my entire life.

Advertisements

About Karl

I live in the Boston area, and am currently studying for a BS in Computer Science at UMass Boston. I graduated with honors with an AS in Computer Science (Transfer Option) from BHCC, acquiring a certificate in OOP along the way. I also perform experimental electronic music as Karlheinz.
This entry was posted in Programming, Scheme and tagged , . Bookmark the permalink.

One Response to An Evil Scheme

  1. Ben says:

    Did you know someone wrote a spreadsheet in Scheme and X? http://siag.nu/index.shtml

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s