To recap, an HTML page is simply a plaintext file containing textual data. That data is given semantic meaning by enclosing it in HTML tags. When enclosed by an HTML tag, that data becomes the tag’s content. Tags may nested, so that one tag’s content may include other tags and their contents.
Without those tags, the text will not be marked up at all. This means that extra whitespace is stripped, including line breaks. If you just copy-and-paste a page of plaintext (like a README file) into an HTML document, it will all run together as a big wall of text.
Each tag starts with a less-than character (
<), and ends with a greater-than character (
>). When used in this manner, they are usually called angle brackets. (The term angle braces is also used, but is less common.) The opening angle bracket is followed immediately by the tag name, which is a keyword used to identify the tag’s type.
The vast majority of HTML tags have content, and those tags must include a closing tag signifying the end of that content. This is another HTML tag with the same tag name, but between the opening angle brace and the tag name, there is a forward slash (
/). The slash cannot have any whitespace around it. If tags are nested, then the closing tags must occur in the proper order:
<a><b></b></a> is OK, but
<a><b></a></b> is wrong. If you do this, you are creating tag soup.
There are some HTML tags that do not contain content (like the tag for an image or a line break). These tag types are usually called empty tags; the W3C calls them void tags. According to the XHTML standard, empty tags do not have closing tags; instead, the “start” tag is terminated with a forward slash directly before the closing angle bracket. There can’t be a space between the slash and the closing bracket, but there can be spaces before the slash. This syntax is required for an empty tag to be valid XHTML, so I recommend that you always use it.
However, HTML is more forgiving, and allows you to simply write an empty tag as an opening tag without a closing tag. (This includes HTML5.) Tags that do this are called self-closing tags. I do not recommend that you use them; instead, all empty tags should be properly terminated.
It should go without saying, but tags that are not empty should not be terminated by a forward slash. You cannot magically create an empty tag out of a non-empty tag; the browser will consider this tag soup, or possibly ignore the tag altogether. Even if the tag’s content is empty, always include both the opening and closing tags.
Here’s an example that includes both types of tags:
<p>This is a paragraph with two sentences. A line break will precede this sentence.</p>
Behind the scenes, the browser converts each HTML element into a DOM node object.
Each HTML element is rendered by the browser with a default style, though these styles are often overridden with CSS. There are two general categories of element rendering styles: block level and inline. Block level elements are displayed with newlines before and after their contents, and some are also indented. Inline elements are not; surrounding text will flow around their contents, without spacing or line breaks. For example, paragraph tags are block level elements, while anchor tags (which define hyperlinks) are inline elements.
When nesting HTML elements, most coding standards say that block level elements should be separated by line breaks, and indented either two or four spaces. Inline elements are usually included in the text without line breaks. This vaguely mirrors how the tags will look in the browser. Of course, any organization can adopt whatever coding standard it wants, but this is a good rule of thumb.
Like other computer languages, HTML also allows you to comment the code. Comments are included for designers and programmers who will look at the source code, so they are not displayed in the browser (and may be stripped by the web server, if it minimizes the code). HTML comments open with the
<!-- tag, and close with the
--> tag. Since HTML ignores line breaks, comments can span multiple lines. You can include any other HTML tags inside comments, but comments cannot be nested.
There will be times when you want to display characters that the browser would normally interpret as HTML, or remove as whitespace. These characters are called special characters, and HTML provides a syntax for specifying them using entity references. An entity reference starts with an ampersand character. This is followed by either an entity name, or Unicode value preceded by a pound sign. It ends with a semicolon. Here are the ones you will need:
- Less-than symbol
- Greater-than symbol
- Non-breaking space
These four entity references are technically the only ones you need. But entity references can be used for more than just special characters. For example, diacritical marks (grave, umlaut, etc.) can be added to a letter by appending the relevant entity reference to the letter. Entity references can also be used to represent hard-to-remember symbols; for example,
£ will produce the symbol for a British pound. Many people find it easier to use the entity reference, than to remember how to produce that symbol using their keyboard.
The HTML tag may also include attributes, which provide information about the element defined by the tag. I’ll talk about them next.