A Guide to HTML

HTML Syntax

To recap, an HTML page is simply a plaintext file containing textual data. That data is given semantic meaning by enclosing it in HTML tags. When enclosed by an HTML tag, that data becomes the tag’s content. Tags may nested, so that one tag’s content may include other tags and their contents.

Without those tags, the text will not be marked up at all. This means that extra whitespace is stripped, including line breaks. If you just copy-and-paste a page of plaintext (like a README file) into an HTML document, it will all run together as a big wall of text.

Each tag starts with a less-than character (<), and ends with a greater-than character (>). When used in this manner, they are usually called angle brackets. (The term angle braces is also used, but is less common.) The opening angle bracket is followed immediately by the tag name, which is a keyword used to identify the tag’s type.

The vast majority of HTML tags have content, and those tags must include a closing tag signifying the end of that content. This is another HTML tag with the same tag name, but between the opening angle brace and the tag name, there is a forward slash (/). The slash cannot have any whitespace around it. If tags are nested, then the closing tags must occur in the proper order: <a><b></b></a> is OK, but <a><b></a></b> is wrong. If you do this, you are creating tag soup.

There are some HTML tags that do not contain content (like the tag for an image or a line break). These tag types are usually called empty tags; the W3C calls them void tags. According to the XHTML standard, empty tags do not have closing tags; instead, the “start” tag is terminated with a forward slash directly before the closing angle bracket. There can’t be a space between the slash and the closing bracket, but there can be spaces before the slash. This syntax is required for an empty tag to be valid XHTML, so I recommend that you always use it.

However, HTML is more forgiving, and allows you to simply write an empty tag as an opening tag without a closing tag. (This includes HTML5.) Tags that do this are called self-closing tags. I do not recommend that you use them; instead, all empty tags should be properly terminated.

It should go without saying, but tags that are not empty should not be terminated by a forward slash. You cannot magically create an empty tag out of a non-empty tag; the browser will consider this tag soup, or possibly ignore the tag altogether. Even if the tag’s content is empty, always include both the opening and closing tags.

Here’s an example that includes both types of tags:

<p>This is a paragraph with two sentences.

A line break will precede this sentence.</p>

Behind the scenes, the browser converts each HTML element into a DOM node object.

Each HTML element is rendered by the browser with a default style, though these styles are often overridden with CSS. There are two general categories of element rendering styles: block level and inline. Block level elements are displayed with newlines before and after their contents, and some are also indented. Inline elements are not; surrounding text will flow around their contents, without spacing or line breaks. For example, paragraph tags are block level elements, while anchor tags (which define hyperlinks) are inline elements.

When nesting HTML elements, most coding standards say that block level elements should be separated by line breaks, and indented either two or four spaces. Inline elements are usually included in the text without line breaks. This vaguely mirrors how the tags will look in the browser. Of course, any organization can adopt whatever coding standard it wants, but this is a good rule of thumb.

Like other computer languages, HTML also allows you to comment the code. Comments are included for designers and programmers who will look at the source code, so they are not displayed in the browser (and may be stripped by the web server, if it minimizes the code). HTML comments open with the <!-- tag, and close with the --> tag. Since HTML ignores line breaks, comments can span multiple lines. You can include any other HTML tags inside comments, but comments cannot be nested.

There will be times when you want to display characters that the browser would normally interpret as HTML, or remove as whitespace. These characters are called special characters, and HTML provides a syntax for specifying them using entity references. An entity reference starts with an ampersand character. This is followed by either an entity name, or Unicode value preceded by a pound sign. It ends with a semicolon. Here are the ones you will need:

&lt;
Less-than symbol
&gt;
Greater-than symbol
&amp;
Ampersand
&nbsp;
Non-breaking space

These four entity references are technically the only ones you need. But entity references can be used for more than just special characters. For example, diacritical marks (grave, umlaut, etc.) can be added to a letter by appending the relevant entity reference to the letter. Entity references can also be used to represent hard-to-remember symbols; for example, &pound; will produce the symbol for a British pound. Many people find it easier to use the entity reference, than to remember how to produce that symbol using their keyboard.

The HTML tag may also include attributes, which provide information about the element defined by the tag. I’ll talk about them next.

Advertisements

About Karl

I live in the Boston area, and am currently studying for a BS in Computer Science at UMass Boston. I graduated with honors with an AS in Computer Science (Transfer Option) from BHCC, acquiring a certificate in OOP along the way. I also perform experimental electronic music as Karlheinz.
This entry was posted in HTML and tagged , . Bookmark the permalink.

4 Responses to A Guide to HTML

  1. Ben says:

    You probably know this, but the self-closing-ness of your tags are likely to be totally ignored by the browser unless you set the mime type to xhtml! For example, in most browsers this
    This is a paragraph
    will render exactly the same as this:
    This is a paragraph

    A common gotcha is thinking that you can use a self-closing script tag, e.g.

    instead of

    More on this on stackoverflow:
    http://stackoverflow.com/questions/69913/why-dont-self-closing-script-tags-work

    So there can be arguments for doing this for style or tool support, but the browser really doesn’t care.

    • Ben says:

      wow, looks like wordpress doesn’t escape tags. Sorry, but hopefully the stackoverflow article explains well enough

      • Karl says:

        Also, about the tags – nope, WordPress does not excape HTML; you can use it to mark up comments (as I did just now), so it really can’t do that. For HTML tags, you have to use the &lt; and &gt; escape sequences. I also had to do this when I wrote the article, so I know how much of a PITA it is.

    • Karl says:

      Ben: First of all, thanks for taking the time to read the article. I need all the help I can get…

      The “self-closingness” issue applies only to tags that do not represent empty elements, and that includes the <script> tag. It’s supposed to contain text data (the actual JavaScript code). Trying to make these tags self-closing is not valid XHTML, and browsers will consider it “tag soup.”

      If the tag is actually self-closing – like the <br /> or <input /> tags – then the XHTML standard demands that they be properly terminated, or they won’t validate. The HTML standard (even HTML5) does not, but since XHTML is the one that has been used for years, I think it’s better to include the terminating slash.

      However, the Stack Overflow post did show something else that I wasn’t aware of: the <p> tag cannot contain other block-level tags, like <div>. (Inline tags are fine.) If you try to do this, the browser will consider it “tag soup” and automatically treat it as a tag with empty content. In other words, <p><div>Hello, world!</div></p> will turn into <p></p><div>Hello, world!</div><p></p>.

      I’ll update the article with this info. So, thank you for pointing this out. Please let me know if you find anything else in the article that needs work.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s