A Programmer’s Tour Of Javascript

Javascript’s Dynamic Type System

JavaScript, like all other programming languages, has variables. A variable consists of an identifier (its name) and a value (the data that it refers to by its name). To declare a variable, you use the var keyword, followed by the variable identifier. To initialize a variable, you assign a value to it using the assignment operator (=). As in other languages, the result of the assignment operator is the assigned value, so you can chain them.

Variables that are declared but not initialized are undefined. I’ll talk about the undefined value a little later.

Like other languages with a C-like syntax, variable names are case-sensitive. But unlike those other languages, JavaScript has extraordinarily relaxed rules for naming identifiers. Naturally, reserved words cannot be used as identifiers. In addition, the identifier must start with a letter, dollar sign, or underscore; the other characters can be anything except for whitespace, mathematical operators, and punctuation (including brackets, braces, and parentheses). But keep in mind that a “letter” can be almost any Unicode character. This means that “p”, “?”, and “$” are all perfectly valid JavaScript identifiers. In fact, many widely-used JavaScript libraries (notably JQuery and Prototype) use “$” as a function name. If you want details, read Valid JavaScript variable names by Mathias Bynens.

Also unlike most other programming languages, JavaScript variables do not have a declared type. A variable may hold the value of any type of data, and that data can be changed at runtime. So, this is perfectly valid JavaScript:

var x = "Hello, world!";
x = 5;

Like most managed languages, JavaScript uses garbage collection to remove unused data from memory. You don’t have to worry about memory management, and in fact, you’re encouraged not to. If you’re really curious, you can read the Memory Management chapter of the JavaScript Guide from the Mozilla Developer Network.

JavaScript variables may not be typed, but their values are. To determine the type of data to which a variable refers, JavaScript provides the typeof operator. This operator returns a string representation of the type. I’ll specify what those strings actually are as the need arises.

There are two general “categories” of types: primitive types and object types. This distinction is not as cut-and-dry in JavaScript as it is in strongly-typed languages like C++ or Java. In JavaScript, the main difference is that primitive types behave like value types, and object types behave like reference types.

Primitive Types

JavaScript has just three primitive types:

number
This type represents all numeric values. JavaScript does not distinguish between numeric sizes (e.g. int vs. long), signed and unsigned numbers, or integer and floating-point numbers.

Numeric literals can use decimal notation (1.555), exponential notation (3.14159E-27), and hex notation (0x5FF). Most interpreters support octal notation (0775), but it’s not standardized, so it should be avoided.

JavaScript also supplies the Infinity numeric value, which (surprise!) represents positive infinity. It has a negated version, so -Infinity represents negative infinity. These values work exactly like you would expect if you were a mathematician, but programmers may not be used to them. If you divide any non-zero number by zero (+0 or -0), you get either Infinity or -Infinity. When you divide zero by zero, you instead get NaN (which I discuss below). When you try to subtract Infinity from itself, you also get NaN.

boolean
A Boolean type can only be true or false. Other types are coerced to a Boolean value when they are used in Boolean expressions. Boolean type coercion is fairly complicated in JavaScript, so I’ll go into more detail later.
string
The string type represents an ordered series of 16-bit Unicode characters. JavaScript does not distinguish between string and character types; there is no char type, just a string with a single character. JavaScript also does not distinguish between single and double quotes.

JavaScript strings can access their individual characters through square brackets, as if they were C-style character arrays. However, JavaScript strings are immutable, so you can’t actually change the character, just read it.

All primitive types behave like value types. This means that they are copied upon assignment, are passed and returned by value, and that equality operators compare their contents (and not their memory locations). This is exactly how primitive types behave in other programming languages, so I won’t go into detail here.

Using typeof on a primitive type will return a string with that type’s name: “number”, “boolean”, or “string”.

“None” values and types: undefined, null, and NaN

Most computer languages have one specific value that means “no value.” What it is called varies by language: C uses zero or NULL, C++11 uses nullptr, Java uses null, Lisp uses nil, Python uses None, and so forth.

Well, JavaScript has three of these values. A variable can be assigned any one of these values. Additionally, two are also types themselves. Let’s get into the details.

undefined
This is the most common “none” value. A variable is undefined when it is declared, but no value was ever assigned to it. A function that does not return any explicit value will return undefined. You also get undefined when you attempt to use a property or method of an object, and no such property or method exists. Note, however, that accessing an undeclared variable will usually cause a ReferenceError, not return an undefined value. (In most cases, an “undeclared variable” is really just a typo.)

The undefined literal represents both a value and a primitive type. Using typeof on a variable with the value undefined, as well as on the undefined value itself, will return “undefined”.

null
This could be considered the JavaScript equivalent of a “null pointer” in other languages. It is a special object that represents “no object.” In JavaScript, variables must be explicitly assigned the null value; uninitialized variables are undefined instead.

The null literal represents both a value and a primitive type, but unlike undefined, the value does not have itself as its type. Using typeof on a variable whose value is null will return "null". However, Using typeof on null itself will return “object”. Just go with it.

NaN
This stands for “Not a Number.” You usually get this because you tried to convert something to a number type, and it can’t be converted. It is also the result if you divide zero by zero, or if you subtract Infinity from itself. Note that this value is commutative: using any numeric operator with NaN will result in NaN. The JavaScript NaN value is specified in IEEE 754, which unfortunately is not available online.

Unlike the other two values, NaN is not a type. Using typeof on a variable with the value NaN, as well as on NaN itself, will return “number”. It is a special value, in that it is not equal to itself. To test if a variable has the value NaN, you need to use the isNaN() function. I’ll talk about this later.

Primitive Type Coercion

If you try to do operations on different primitive types, JavaScript will automatically convert the operands to compatible types before evaluation. The technical term for this is type coercion. The creators of JavaScript wanted this to “just work;” they wanted the language to behave how humans expected it to behave. Of course, human expectations are not exactly rational, so the type coercion rules ended up being a bit wonky.

When other types are coerced into the number type:

  • If it’s a boolean, then true evaluates to 1, and false evaluates to 0. This is about as straightforward as we’re going to get.
  • If it’s a string, JavaScript attempts to parse it as if it were a numeric literal. If it can’t do that, it evaluates to NaN. Remember that NaN is commutative, so any further numeric operations will also result in NaN.
  • If it’s null, it evaluates to zero.
  • If it’s undefined, it cannot be evaluated as a number at all; the result will be NaN.
  • If it’s an object, that object is first converted to a primitive value (by implicitly calling the valueOf() method). Then, that primitive value is coerced to a number using the rules above. (For most objects, this will result in NaN.)

Generally speaking, JavaScript does numeric type coercion whenever it encounters one of the relational operators, the mathematical operators, or their assignment variations: +, +=, -, -=, /, /=, *, *=, >, >=, <, or <=. But there are two huge “gotchas,” both involving strings.

In JavaScript, the binary addition operator is also the string concatenation operator. This means that if you “add” a string and a number, JavaScript will coerce the number to a string, not the other way around. Thus 0 + "1" will yield "01", not the number 1.

This also applies to the addition assignment operator: +=. On the other hand, the unary addition operator does not do string concatenation. So, 0 + +"1" will yield the number 1.

The second “gotcha” is that the relational operators also operate on strings – but only if both sides of the operator are string types. If that’s the case, then it does a lexicographical comparison of the strings, and returns true or false depending upon the operator. This is essentially the same thing C++ does to overload the relational operators for string objects. The JavaScript algorithm is specified by The Abstract Relational Comparison Algorithm in the ECMAScript Language Specification.

On the other hand, if one side of a relational operator is a string, but the other isn’t, then JavaScript will coerce both sides to number types, and then compare them.

When other types are coerced into the string type, the result is generally a human-readable form of the value. So, for example, if the type is a number and its value is 3.14159, then the resulting string will be “3.14159”. Converting boolean types will result in “true” or “false” depending upon the value.

What is not quite as straightforward is what happens when the “none types” are coerced. The result is actually their value as a string: “NaN”, “undefined”, or “null”. This may be unexpected, especially from people coming from other languages, who expect those values to be coerced to the empty string.

Speaking of which, a quick-and-dirty way to convert anything to a string is just to concatenate it with the empty string:

> x = "" + null;
  "null"

The most confusing part of JavaScript’s type system involves Boolean coercion. This happens in the conditional expression of an if statement, while or do/while loop, or for loop. It also happens when you use any of the logical operators: ==, !=, !, &&, or ||.

Other than false itself, there are six different values that evaluate to a Boolean false. Among JavaScript pedants, these values are called “falsy” or “falsey.” All other values are called “truthy.” (This colloquialism has been around for many years, so suck it, Stephen Colbert.)

These are the six falsy values:

  • undefined
  • null
  • NaN
  • “” (the empty string)
  • 0 (the number zero)
  • -0 (the number negative zero)

Unfortunately, there are a lot of “gotcha” rules when using the == (equality) and != (inequality) operators:

  • Both undefined and null are not considered equal to any other falsy value except themselves:
    > false == null;
      false
    > false == undefined;
      false
    > null == undefined;
      true
    
  • NaN is never considered equal to any other value, not even itself:
    > false == NaN;
      false
    > NaN == NaN;
      false
    

    This is why you need to use the isNaN() function to test if a variable has the sepcial NaN value.

  • A string that contains only whitespace will evaluate to false, unless it is compared with another string:
    > false == " \t\n";
      true
    > false == " ";
      true
    > " \t\n" == 0;
      true
    > " \t\n" == " ";
      false
    

Making matters worse, these “gotchas” only happen when you use the equality operators. The other Boolean operators behave exactly as expected:

> if (" " &amp;&amp; " \t\n") true; else false;
  true
> if (" " &amp;&amp; "") true; else false;
  false

This is because the equality operators follow The Abstract Equality Comparison Algorithm from the ECMAScript specification.

JavaScript also provides the === (strict equality) and !== (strict inequality) operators. These operators follow the Strict Equality Comparison Algorithm. They do not do any type coercion at all, so two expressions will be equivalent only if they are exactly the same value and type.

As a general rule, you should always use the strict equality operators instead of their non-strict versions.

Object Types

Any variable that refers to any kind of object will be an object type. Object types behave like reference types in other languages: they are not copied upon assignment, are passed and returned by reference, and equality operators compare their memory locations (not their values).

Generally speaking, using typeof on an object type will simply return "object". There is one exception: function objects. In JavaScript, functions are first-class objects, so using typeof on a function name will return "function".

Objects have members, which can either be properties (data) or methods (functions). The members of an object are (usually) accessed by the “dot operator,” more formally the member access operator: a single period between the object on the left, and its member on the right. Objects inherit all of the members of its prototype, so all objects (even functions) inherit all of the methods of Object.prototype. The most useful methods are toString() and valueOf().

Both objects and functions are big topics, so I won’t go into any great detail here. Each will get extensive coverage later in the article. But it is necessary to know about objects in general in order to understand wrappers and regular expressions.

Wrapper Objects and Wrapper Functions

A wrapper object is an object that represents a primitive type, and has methods that do useful operations on those types. With the exception of the “none” types (null and undefined), each primitive type has a corresponding function that can construct a wrapper object. For reference, they are Number, Boolean, and String.

Wrapper objects are usually temporary objects that are created when you apply a method to a primitive type. For example, the String object has a convenient method called trim(), which trims the whitespace from the beginning and end of a string, and returns the new string. You can use this method on a string literal, like so:

> "  Hello, world!  ".trim();
  "Hello, world!"

The technical term for this is autoboxing. What happens is that the JavaScript engine creates a temporary String object out of the " Hello, world! " literal; invokes the trim() method on that object, returning the result; then destroys the temporary object. (Or, at least, it behaves as if it does; how it actually accomplishes this is not specified in the standard.)

This also works for variables that represent primitive types. You can invoke the trim() method on a variable that represents a primitive string, and the type of the variable won’t change. In fact, this is probably how you’ll use autoboxing 99% of the time.

Wrappers can also be used to convert between types. Say that you want to convert a number to a string. You could use the Number function to do so:

> x = "3.1415";
  "3.1415"
> typeof x;
  "string"
> x = Number(x);
  3.1415
> typeof x;
  "number"

Be careful, however, because all of the wrapper functions are constructors for the wrapper objects. JavaScript uses the new keyword to construct objects using functions as constructors. If you put the new keyword before one of the wrapper functions, you’ll create an object, and not a primitive type.

This is very confusing, completely unnecessary, and can have a pretty severe performance impact. So, don’t do it. Just use autoboxing instead.

Regular Expression Objects

JavaScript was originally designed to manipulate documents (web pages), so it’s no surprise that they wanted to make text manipulation a big part of the language. Regular expressions are a huge part of text manipulation, so JavaScript provides them as RegExp objects with their own literal syntax.

To create a RegExp literal, you put a pattern between two forward slashes, followed by optional modifiers. It is also possible to create an object using the RegExp() constructor, passing in the pattern string as the first argument, and a modifier string as the second. Here’s an example of both:

// RegExp literal
var r = /^[a-z_$][\w$_]*$/i;
// Using the RegExp function
var r = new RegExp("^[a-z_$][\w$_]*$", "i");

Generally speaking, it is preferable to use the literal syntax, because it has much better performance. The only time you should use the constructor is when your regular expression pattern is only known at runtime (for example, it comes from user input).

JavaScript only has three universally supported modifiers:

  • i: performs case-Insensitive matching
  • g: performs Global matching (finds all matches, not just the first)
  • m: performs Multiline matching

These modifiers are essentially flags, so they can be combined without issue.

Once you have a RegExp object, you can test that object against a string. There are two useful methods to do this. The test() method simply returns true if there is a match, and false otherwise. The exec() method returns the first occurrence of the match (as a string object), or null if no match is found.

The String object also has methods that accept RegExp objects as parameters: search(), replace(), match(), and split(). In fact, it is probably more common to use these methods than it is to use the RegExp methods.

The pattern syntax used by JavaScript is based on the Perl syntax, which is the one used by the PCRE library, PHP, Java, the C++ regex library, etc. If you’ve used any of those to create regular expressions, then JavaScript’s syntax should be fairly straightforward. But if you haven’t, then it’s outside the scope of this article. For more information, I suggest reading the Regular Expressions chapter in the MDN JavaScript Guide.

Advertisements

About Karl

I live in the Boston area, and am currently studying for a BS in Computer Science at UMass Boston. I graduated with honors with an AS in Computer Science (Transfer Option) from BHCC, acquiring a certificate in OOP along the way. I also perform experimental electronic music as Karlheinz.
This entry was posted in JavaScript, Programming and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s