A Guide to HTML

Text Markup

This section will go over HTML tags that semantically mark up parts of the text. These are the kind of HTML tags that most people are already familiar with. Most of these tags (but not all) are inline; text outside the tags will flow around their contents, without spacing or line breaks. There are a ton of these tags, and I tried to separate them according to use case.

Text information

<a>
Anchor. This tag defines a hyperlink to another document, or to another part of the same document. The contents of the tag, when clicked on, will take the user to the target location of the hyperlink.

Attributes:

href
A hyperlink reference. This can be a URL, either fully-qualified or relative. It can also be just a fragment identifier: the ID of another tag on the page, preceded by a pound sign. In this case, the location would be that specific tag in the document. You could use fragments to handle footnotes, for example.

Since an anchor is an interactive element, some HTML designers use it only to trigger a JavaScript function, and not as a link to another document. (This is probably not a good idea, since users may disable JavaScript in their browsers.) If they do this, a common trick is to put use an empty fragment identifier (just a pound sign) as the value for the href attribute. If the user clicks on the hyperlink, the browser will have nowhere to go, so it does nothing.

target
The target of that link. The values of the target attribute can be:

_self
The same browser window as the current HTML page. This is the default.
_blank
A new window or tab in the browser.
_parent
The window of the parent document, if the HTML page is embedded in an <iframe> element. (I will talk about the <iframe> element in the section on embedded content.)
_top
The topmost browser window. Unless the HTML page is in an <iframe>, it is the same as _self.
<iframe> ID
A matching id attribute of an <iframe> element. If you use this target, the contents of the <iframe> will be reloaded using the href value of the <a> tag.
rel
The relationship between this document and the linked document. Unlike the <link> tag, this attribute is not required, and in fact has few use cases. One of those use cases, however, is important: it can tell web crawlers not to follow this hyperlink. (This attribute is usually specified on links in comments, to prevent SEO spam.) To do this, use the nofollow value.
<abbr>
An abbreviation or acronym. The W3C recommends that you specify the expansion (what the letters stand for) as the value of the tag’s title attribute. Most browsers display this in a tooltip.

You do not need to use this tag every time you write an abbreviation, unless you need to (e.g. to style all abbreviations using CSS). But it is common practice to use this tag, and its title attribute, when readers first encounter the abbreviation. Because of this, it is often used with the <dfn> tag (see below). It is common practice to nest the <abbr> tag inside the <dfn> tag.

<address>
This tag is used to specify the contact information for the author of the page or article. It should not be used to mark up a postal address; this is a common mistake.

This is a block-level element, so it will be displayed with empty lines before and after the address.

<bdi> (discouraged)
Bi-directional isolation. This tag is new in HTML5, and serves largely the same purpose as <bdo> (see below). The difference is that you cannot specify a text direction with <bdi>, so it is used where the text direction is unknown. It is only supported by modern versions of Chrome and Firefox, and no other browsers.

I recommend that you use another element instead, and specify its dir attribute as auto. If the text direction is known, you should use the <bdo> element, which is supported by all browsers.

<bdo> (uncommon)
Bi-directional override. This tag is used to specify a string of text that is written in a different direction from the rest of the document (for example, Arabic script). This tag is not used very often, because the text direction can be set on any textual element by specifying its dir attribute.

Attributes:

dir
The text direction. This attribute is required. Its value must be either rtl (right-to-left) or ltr (left-to-right); unlike other tags, it cannot accept a value of auto.
<br />
Break. This will put a single line break in the text, not a whole empty line (like block-level elements). Naturally, it is an empty element, so it must be closed.
<dfn>
Definition. You should use this tag when you are first defining a term, and the text surrounding the <dfn> element should contain the definition of this term.

This tag should be used for terms that are defined inline, with the rest of the document. If you are making a list of defined terms (like a glossary), then you should use a definition list instead. See the section on structured text for details.

<em>
Emphasis. The HTML5 standard defines this as “stress emphasis,” meaning an emphasis that would be in an alternate (louder) voice if the words were spoken. By default, emphasized text is rendered using italics.
<hr />
Horizontal rule. This puts a “line” or “beam” across the document, and acts to separate the content into visual sections. With the advent of CSS, it is not as common as it used to be, but it is still used. It is an empty tag.

The W3C HTML5 standard redefines this tag to mean a a paragraph-level thematic break, e.g. a scene change in a story, or a transition to another topic within a section of a reference book. It recommends that you not use the <hr> tag if a header or <section> tag can serve the same purpose.

<mark> (HTML5 only)
Highlighted (“marked up”) text. The HTML5 standard says this element represents a run of text in one document marked or highlighted for reference purposes, due to its relevance in another context. For example, you could use it to highlight a part of a quotation that you want to focus on. Or if a user performs a search, server-side software could use this tag to mark up search results in the page. By default, supporting browsers render the text with a light yellow background.
<p>
Paragraph. This tags wraps a paragraph in a document. It is a block-level tag, so it will be displayed with a blank line before and after its contents.

This is not an empty tag. A closing tag is required for valid XHTML, and leaving it off has been discouraged since HTML 4.01 (at least). Designers shouls always provide the closing </p> tag.

However, back when HTML was being standardized, it inherited SGML’s concept of an “implicitly closed” tag. This is a tag that may have its closing tag omitted, if it is followed by certain other tags. The paragraph tag is an implicitly closed tag, and HTML (even HTML5) inherits this behavior.

This means that you cannot nest most block-level tags inside the <p> tag. If you do this, browsers will implicitly add the closing </p> tag. For example, this:

<p><div>Hello, world!</div></p>

…will be implicitly converted to this:

<p></p><div>Hello, world!</div><p></p>

This can lead to some very unexpected behavior, especially when using CSS or JavaScript.

Most HTML tags are inline tags, and those can be nested without issue. The ones that cannot be nested are: <address>, <article>, <aside>, <blockquote>, <div>, <dl>, <fieldset>, <footer>, <form>, <h1><h6>, <header>, <hgroup>, <hr>, <main>, <nav>, <ol>, <p>, <pre>, <section>, <table>, and <ul>.

<pre>
Pre-formatted text. The contents of this tag will be rendered without stripping any whitespace; all spaces, line breaks, tab characters, etc. will be intact. Most browsers will render pre-formatted text in a fixed-width font.

This tag can be used to render lots of different kinds of content, such as ASCII art or post office addresses. It is also used quite often to display programming code. See the <code> tag for details.

<small> (discouraged)
Small text. This is a presentational element, so it is recommended that you use CSS instead.

This was never formally deprecated in the HTML 4.01 standard, though they also discouraged its use. The HTML5 standard redefined it to mean side comments such as small print. I would still avoid using it if possible.

<span>
This is a tag that defines some generic span of text. It is an inline element, so surrounding text will flow around its contents. By itself, it has no semantics, and browsers render its contents the same as the surrounding text. You should define its semantics using its class attribute, so that its contents can be targeted with CSS or JavaScript. It should only be used as a last resort, when no other tag is appropriate.

But there are many cases where no other tag would be appropriate, so it is still widely used. For example, if you’re displaying an artist’s discography, you could use <span class="album-format"> to define the text representing an album format (CD, LP, digital download, or whatever). It is also one of the ways that the W3C recommends marking up “subheadings” (subtitles, alternative titles, or taglines) in a headline.

If you want to define block-level text, rather than inline text, use a <div> tag instead.

<strong>
Strongly emphasised, important text. This may be used with text that would be highly emphasised when spoken (hence, a more emphatic version of <em>), or it may be text that is typographically emphasized in the page (like a warning). By default, is is rendered in bold type by the browser.
<sub>
Subscript. This will be rendered by the browser in a smaller font size, and slightly lowered relative to the surrounding text.
<sup>
Superscript. This will be rendered by the browser in a smaller font size, and slightly raised relative to the surrounding text.

In printed text, footnote numbers are often printed as superscript, but this is not the consensus for HTML footnotes (probably because they’re harder to click on). Instead, put the footnote number inside square brackets. To link the footnote number (and brackets) to the footnote’s location on the page, use the <a> tag.

Quotations

These tags should be used when you are quoting another source in your document.

The convention for abbreviating quotes is to put your own alterations (including ellipses) in square brackets. For example, let’s say the full quote was “When Fred came home after working at the car wash, he said he had a great day, before eating dinner.” You could shorten the quote in this way: “When Fred came home […] he said he had a great day[.]” If you emphasize text that is not emphasized in the original, make sure you indicate this as well, saying something like “(emphasis added).” Make sure your abbreviations don’t fundamentally alter the meaning of the quote. This is not part of the HTML standard, it is just good practice.

<blockquote>
Quotation block. This tag should be used to display a lengthy block of quoted text. It is a block element, so it will be displayed with empty lines before and after the content. Browsers usually display the contents with indented margins on the left and right.

The W3C suggests that you put source citations in the <cite> tag, possibly nested in a <footer> tag (if you’re using HTML5). This can be put inside or immediately following the <blockquote> tag, but it probably makes more sense to put them inside.

<cite>
Citation. The HTML 4.01 standard defines this as a citation or a reference to other sources, while the HTML5 standard defines this as a reference to a creative work. In practice, this means either the title of the quoted work, or the author of the quotation (whether written or spoken). It should not be used to mark up the actual quotation; use <q> or <blockquote> for that.

Because many sources are available online, it is fairly common to use an <a> tag to link to the cited source. Convention dictates that the <a> tag be nested inside the <cite> tag.

This tag should not be confused with the cite attribute, which will not be displayed in the browser. (See below.)

<q>
Quotation. This is used to mark up a short quotation; it will appear inline with the surrounding text. For longer quotations, use the <blockquote> tag, which will display the quotation as a separate block of text. Browsers will add quotation marks around the contents of the <q> tag, so you don’t need to include them.

Quotation Attributes

cite
The source of the citation. The value should be a URL to a resource where the quotation is taken from; this can be a relative URL, but more likely will be an absolue URL to a page on a different website. It is meant to be read by machines, and will not be displayed in a web browser.

Displaying code

These tags serve the special purpose of displaying computer code in an HTML document. In most cases, the <code> tag will be all you need; the others are not widely used.

There are two things to keep in mind about these tag. The first is that any HTML inside them is not escaped; if the code you’re presenting is HTML code, you will need to use HTML entity references for the angle brackets. The second is that they are inline tags, and do not retain formatting. So if you’re displaying a block of code, you will need to use a <pre> tag as well. The W3C standard suggests that you put the code-related tags inside the <pre> tag.

Nowadays, it is very common to use a syntax highlighter to mark up code. A syntax highlighter will parse the code and surround different things (like function or variable names) in different HTML tags, which are then targeted with CSS to display the code in different colors, like you would see in an IDE. The vast majority use client-side JavaScript to do so, and are packaged as a library. Personally, I recommend Prism, because it is the most semantically correct. Other popular syntax highlighters include SyntaxHighlighter, Prettify, highlight.js, or Ace. There are many, many more. If you use one of these, you should mark up your HTML according to what the library expects.

If your code should be treated like a “figure” (as it is in many textbooks), you should consider wrapping everything in a <figure> tag, and give it a caption using the <figcaption> element.

<code>
Computer code: JavaScript, CSS, C++, Java, SQL, assembly language, etc.

There is no standardized way to specify what programming language the code is written in. The W3C suggests you use the class attribute, and a value string having a language- prefix followed by the name of the programming language. No browser has ever done anything with this information, but some client-side syntax highlighters use this technique.

<kbd> (uncommon)
What the user would type at a keyboard; that is, the command-line input to a program.
<var> (uncommon)
A variable. This could be a variable in a programming language, but it could also be used to mark up a mathematical variable.
<samp> (uncommon)
“Sample output;” the command-line output from a program.

Insertions and deletions

These tags deal with text that has been deleted from the document, and replaced with other text.

<del>
Deleted text. By default, it is rendered as strikethrough (crossed out) text by default.
<ins>
Inserted text. By default, it is not rendered differently from any other text.

Insertion and Deletion Attributes:

cite
Specifies the URL of a document that explains the change. This won’t be rendered by browsers; it’s mainly for machine use (e.g. to gather statistics about a document).
datetime
Specifies the time and date when this edit took place. Like the cite attribute, this will not be rendered by browsers. The datetime string should have the format YYYY-MM-DD hh:mm:ssTZD (year, month, day; hour, minute, second; time zone designator). If you specify the date, the time is optional; the time zone is optional in any case. You can also use a “T” character instead of a space to separate the date and time.

Ruby annotations

Ruby annotations (sometimes spelled Rubi) are annotations that are usually used as pronunciation guides. They are most common in East Asian languages (Chinese, Japanese, Korean, or Vietnamese), where a more complex script (like Japanese Kanji) is broken down into multiple syllables using a phonetic script (like Furigana). The annotations are usually displayed above the main characters, but they may be displayed on the side if the text runs top-to-bottom.

Ruby annotations are not widely used outside of East Asian countries. They are not supported by Firefox, or by Android browsers earlier than Android 3.0. Surprisingly, they have been supported by Internet Explorer since version 5.5 at least.

It goes without saying that you should not confuse Ruby annotations with the Ruby programming language. Both originated in East Asia, but that’s about all they have in common.

<ruby>
This is the root element (the “container” tag) for text that has Ruby annotations. The <rb>, <rt>, and <rp> tags should be nested inside it.
<rb>
Ruby base. This is the text to be annotated.
<rt>
Ruby text. This tag’s contents contain the actual Ruby annotations.
<rp>
Ruby parentheses. This tag exists for browsers that do not support Ruby annotations.

If a reader is using one of those browsers, then the annotations should be displayed inside parentheses. But if their browser does support Ruby annotations, you don’t want the parentheses to show. This tag is the solution to that problem. You surround the opening and closing parentheses characters inside the <rp> tag. Browsers that support Ruby annotations will recognize the <rp> tags, and hide the parentheses; other browsers will ignore the unrecognized <rp> tags, and show the parentheses as usual.

Here is some Ruby code showing you how to pronounce my name:

<ruby>
  <rb>Karl Giesing</rb>
  <rp>(</rp><rt>KAH rl GEE zing</rt><rp>)</rp>
</ruby>

Deprecated Tags

There are a lot of text markup tags that are deprecated. Rather than mix them in with the other tags, I decided to put them in their own section. A few of them were deprecated because their behavior overlapped other tags; but the majority were deprecated because the tags were presentational, and not semantic.

The W3C revived some of these tags in the HTML5 standard, redefining them in the process. This was probably done because they are still in common use, even though they shouldn’t be. I would personally treat those tags as still being deprecated, and avoid using them.

<acronym>
Accronym. Use the <abbr> tag instead.
<b>
Bold text. If you want to mark up important text, use the <strong> tag instead; it renders as bold type by default. Otherwise, use CSS.

The <b> tag was deprecated in HTML 4.01, but revived in the HTML5 standard. It was redefined to mean a span of text to which attention is being drawn for utilitarian purposes without conveying any extra importance and with no implication of an alternate voice or mood. (Yeah, OK then.) The W3C’s examples include key words in a document abstract, product names in a review, actionable words in interactive text-driven software, or an article lede.

<big>
Big text; that is, text rendered in a bigger font size. Use CSS instead.
<font>
This tag was used to display text in a different style. Use an appropriate semantic tag (or <span> if necessary), and style it using CSS.
<i>
Italic. The <em> tag is rendered as italic text by default, so you should use that tag instead.

The <i> tag was deprecated in HTML 4.01, but revived in the HTML5 standard. It was redefined to mean a span of text in an alternate voice or mood. The W3C’s examples are a taxonomic designation, a technical term, an idiomatic phrase from another language, transliteration, a thought, or a ship name in Western texts.

<s>
Strikethrough. If you want to mark up deleted text, use the <del> tag instead.

In HTML 4.01, this was a synonym for the <strike> tag (and thus deprecated). It was revived in the HTML5 standard, and redefined to mean contents that are no longer accurate or no longer relevant. Note that the <strike> tag was not revived.

<strike>
Strikethrough. If you want to mark up deleted text, use the <del> tag instead. (But, also see the <s> tag.)
<tt>
Teletype. This was rendered as a fixed-width font, and you should use CSS for this instead. In most cases, other tags would be more appropriate, like the <pre> or <code> tags.
<u>
Underline. There is no non-deprecated equivalent. Most things that should be underlined are emphasized or important, so should use either the <em> or <strong> tags. For edge cases, you could use a <span> tag with a specialized class attribute.

The <u> tag was deprecated in HTML 4.01, but revived in the HTML5 standard. It was redefined to mean a span of text with an unarticulated, though explicitly rendered, non-textual annotation. This could include text that is spelled wrong, or a proper name in Chinese script.

Advertisements

About Karl

I live in the Boston area, and am currently studying for a BS in Computer Science at UMass Boston. I graduated with honors with an AS in Computer Science (Transfer Option) from BHCC, acquiring a certificate in OOP along the way. I also perform experimental electronic music as Karlheinz.
This entry was posted in HTML and tagged , . Bookmark the permalink.

4 Responses to A Guide to HTML

  1. Ben says:

    You probably know this, but the self-closing-ness of your tags are likely to be totally ignored by the browser unless you set the mime type to xhtml! For example, in most browsers this
    This is a paragraph
    will render exactly the same as this:
    This is a paragraph

    A common gotcha is thinking that you can use a self-closing script tag, e.g.

    instead of

    More on this on stackoverflow:
    http://stackoverflow.com/questions/69913/why-dont-self-closing-script-tags-work

    So there can be arguments for doing this for style or tool support, but the browser really doesn’t care.

    • Ben says:

      wow, looks like wordpress doesn’t escape tags. Sorry, but hopefully the stackoverflow article explains well enough

      • Karl says:

        Also, about the tags – nope, WordPress does not excape HTML; you can use it to mark up comments (as I did just now), so it really can’t do that. For HTML tags, you have to use the &lt; and &gt; escape sequences. I also had to do this when I wrote the article, so I know how much of a PITA it is.

    • Karl says:

      Ben: First of all, thanks for taking the time to read the article. I need all the help I can get…

      The “self-closingness” issue applies only to tags that do not represent empty elements, and that includes the <script> tag. It’s supposed to contain text data (the actual JavaScript code). Trying to make these tags self-closing is not valid XHTML, and browsers will consider it “tag soup.”

      If the tag is actually self-closing – like the <br /> or <input /> tags – then the XHTML standard demands that they be properly terminated, or they won’t validate. The HTML standard (even HTML5) does not, but since XHTML is the one that has been used for years, I think it’s better to include the terminating slash.

      However, the Stack Overflow post did show something else that I wasn’t aware of: the <p> tag cannot contain other block-level tags, like <div>. (Inline tags are fine.) If you try to do this, the browser will consider it “tag soup” and automatically treat it as a tag with empty content. In other words, <p><div>Hello, world!</div></p> will turn into <p></p><div>Hello, world!</div><p></p>.

      I’ll update the article with this info. So, thank you for pointing this out. Please let me know if you find anything else in the article that needs work.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s