HTML Headers and Structure
An HTML page is fundamentally divided into two sections: the header section, and the body. The header section specifies metadata about the HTML page itself. The body of the HTML page is the part of the HTML document that is displayed in the browser. This means that all visible markup tags go in the body section.
There are also tags that tell the browser to recognize the file as an HTML file (and not some other form of plaintext). These are technically not HTML tags, but they are needed in the header section, so I’ll also cover them here. I’ll go through the tags in the order that they usually appear.
- XML declaration (XHTML only)
- This is an XML tag, which is needed at the start of all XML documents. It is also called the XML prolog. The tag is used to specify the XML version, and the character encoding you are using. (See below for a discussion about character encoding.) You only need this tag if you are writing XHTML; HTML documents should not use this tag at all.
Here is the tag for XHTML 1.0, using the UTF-8 character set:<?xml version="1.0" encoding="UTF-8"?>
You can also use XML version 1.1, or another character set, but these are the most common. In fact, if you use the default UTF-8 character set, this tag isn’t necessary at all; the browser will pick up the fact that it is XHTML from the DOCTYPE tag (see below).
If you are not using XHTML, then you should supply the character encoding using a
<meta>tag with the
http-equivattribute. See below for information about the
<meta>tag, and further below for a discussion about character encoding.
- This tag specifies the document type for the file. It is not an HTML tag, though it has a similar syntax. It tells the browser that the file is HTML, so all HTML documents must have this tag, and it must appear at the very beginning of the document. Here are DOCTYPE tags for the most common versions of HTML:
- <!DOCTYPE html>
- XHTML 1.0 Strict
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- HTML 4.01 (deprecated)
- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
- This HTML tag encloses the entire HTML document (except for the DOCTYPE tag, and the XML prolog if applicable). All other HTML tags, including the header and body, should be nested inside this one.
- This attribute determines the XML namespace. Unsurprisingly, it is only necessary if you are creating an XHTML document; HTML documents shouldn’t use this attribute. If you’re using XHTML, the attribute’s value should be
- This attribute specifies the document’s language. It is actually a global attribute, but this is where it is most commonly used. If it is omitted, the web browser will try to determine the language by other means, but in practice it will probably default to English. On the other hand, if the attribute specifies a language that is different from the user’s, then browsers that integrate with translation services (like Google Chrome) will offer to translate the page. If your document is written in any language other than English, you should always use this attribute; even if it is in English, setting the language never hurts.
- Encloses the entire HTML header; everything else should go in the body. The rest of the tags in this section should all go in the header section, before the closing
</head>tag. (The possible exception might be the
<script>tag – see below.)
- Specifies the title of the page. This tag is required. There can be only one
<title>tag in a valid HTML document. The title is not displayed on the page, but it is displayed in the browser’s tab bar (if there is one). The contents should match the title of the page as displayed to the user (using e.g. a header tag – see the next section).
<meta>tag specifies various types of metadata about the HTML page to non-human readers (such as search engines). Keep in mind that providing metadata does not mean that anyone will actually use it. It is an empty tag; all the metadata is provided by its attributes.
- This attributes supplies the metadata associated with the
http-equivattribute values (see below).
nameattribute specifies the type of metadata that the tag specifies, when it is not a
http-equivattribute. This is some kind of metadata that you want machines to read (though there’s no guarantee they will). The type of data is the value of this attribute; the actual data is the value of the
nameattribute must be one of these values:
contentvalue will be the name of the web application associated with the HTML page. This is usually put there by the application itself.
contentvalue will be the name of the author of the HTML page. If there are multiple authors, you should use a separate tag for each.
contentvalue will be a short description of the web page’s content. Search engines commonly display the description below a link to the web page.
contentvalue will be the name and version of the software used to create the HTML file (e.g. Dreamweaver). This is usually put there by the application itself.
contentvalue will be a comma-separated list of keywords for the web page. In theory, if the user types some of these keywords into a search engine, your page should come out nearer to the top in search engine rankings. I say “in theory” because search engines all but ignore keywords nowadays. Unsurprisingly, this metadata was widely abused by unscrupulous websites seeking to game their search engine rankings. Even if the page author is not being abusive, the plain fact is that the user – not the website owner – determines which page is the best fit for certain keywords. Still, it doesn’t hurt to provide keywords, especially if those keywords are rare or topic-specific.
- This attribute is used to tell search engines to ignore the page in specific ways. The
contentvalue will be either
noindex(to not index this page),
nofollow(to not index outgoing links), or both (separated by a comma). Reputable search engines will follow these directives; unreputable ones will not. Obviously, this only applies to the current page, so the same content will still be indexed if it is linked from another website.
This value applies to all search engines; specific search engines may use their own values. For example, Google Search uses
googlebot, so you could make your website available to all search engines except Google, if for some reason you wanted to.
- In HTML5, this attribute specifies the character encoding. If you are using XHTML, then you should specify the character encoding in the XML declaration instead. You can also specify the character encoding using the
http-equivattribute, but this is outdated. See below for a discussion about this.
- Specifies a string of “HTTP equivalent” data. It is not very useful, since anything this attribute can do is better accomplished by other means. The
http-equivattribute can have one of these values:
- Specifies the content type (which will always be “text/html”) and character encoding, separated by a semicolon. For example, this tag specifies that this is an HTML document with UTF-8 character encoding:<meta http-equiv="content-type" content="text/html; charset=UTF-8">
This attribute is rarely used anymore. If you are using HTML5, then you should use the
charsetattribute instead. On the other hand, if you are using XHTML, then you should specify the character encoding in the XML declaration. See below for a discussion about this.
- Specifies the stylesheet that will be used as the default. The value must be the name of a stylesheet imported using the
<link>tag (see below). It is rarely needed; multiple stylesheets will be imported in the order they appear in the code, so it’s more common to simply use the first
<link>tag for the default stylesheet.
- If this attribute is used, the page is refreshed at regular intervals. The value is the refresh delay, in seconds; it may be followed by a URL to go to upon refresh (separated with a semicolon). This is discouraged by the WC3, because it takes control of the page away from the user. It was usually used as a “hack” to do URL redirection, to prevent link rot (or for more nefarious reasons). This is also discouraged because it may break the browser’s back button. It is better to use URL rewriting, or to have the server send an send an HTTP redirection status code.
There are a lot more
<meta>attributes and values in use today, but these are the most common. You can also use an attribute starting with
data-as a custom data attribute, though the cases where this is useful are rare. If you want the details, read “Embedding custom non-visible data with the data-* attributes” from the WHATWG HTML5 living standard.
- This tag defines a link between the HTML page, and an external resource. This other resource is commonly a CSS file. Multiple
<link>tags are allowed in the header, so you may import more than one CSS file. These are imported in order, so if there are conflicts, the last CSS file imported is the winner.
The most common use is to import CSS, but there are other uses as well. For example, this tag could be used to link to a translated version of the document, or to the same document in a non-HTML format. It is also used to define a link to a favicon, the small image that is displayed in a browser’s favorites menu.
- This attribute specifies the MIME type of the linked document. The default value is
text/css, which is the MIME type for a CSS file. This is almost always what you want, so this attribute is only specified if you’re linking to something other than CSS. For example, if you’re linking to a favicon, the type might be
- “Hyperlink reference.” This is the URL of the file that you want to link to. Like any URL used in HTML documents, it can be either a relative or fully-qualified URL.
- This attribute is required. It defines the relationship between the HTML document and the linked file. The value will usually be
stylesheet, since this is used with CSS. If you’re defining a link to a favicon, you can use
- This attribute can be used to only include the linked resource when the user is browsing on specific media. This is usually used to provide different CSS files for different viewing contexts, such as print or mobile versions. If this attribute isn’t specified, the default value is
all(all media types). Other common values are
- This tag is used to embed CSS styles in your HTML. Since HTML is a semantic language, it should never contain information about presentation. So, don’t use this tag. Instead, put your CSS in a separate file, and import it using the
Whether it’s in the tag contents or imported, the script is parsed and executed when the
<script>tag at the bottom of the body, and not in the header. This is often done, but the vast majority of people still put the tag in the header section.
It is a fairly common mistake to treat the
Instead, you should always include the closing tag:<script src="myscript.js"></script>
- If you use the
deferattribute, you are telling the browser to defer execution of the script until the entire page has been loaded. This is a Boolean attribute, so the value of this attribute must also be
The reason this attribute is uncommon is because browser support is spotty. It was not supported by Internet Explorer before version 10, Firefox before version 3.6, and Opera before version 15.0.
If you want to make sure the script follows the default behavior – that it is not deferred – then HTML5 offers the
asyncattribute as well. It is also a Boolean attribute (so its value should be
async), and support for it is just as spotty.
- This is the URL to an external source file. The value can be either a relative path, or a fully-qualified URL.
If you’re curious, you might be wondering what the difference is between
src, and the
hrefattribute used in other tags. The answer is that replaced elements use the
srcattribute, while non-replaced elements use the
hrefattribute. A replaced element can have intrinsic dimensions, determined by the external resource, so that resource must be parsed before the rest of the page.
- This attribute used to be used to specify the script’s programming language. The value of this attribute was never standardized, and has been deprecated for quite some time. Instead, use the
typeattribute (if necessary).
- This tag specifies the base address for all relative URL’s in the document, and/or the default target for all links on the page.
There are two reasons it is not widely used. First, there are not many cases where it is useful; the default linking behavior usually works fine, and can be overridden on a per-link basis. Second, it is not consistently implemented across browsers. Internet Explorer requires a closing tag, but the official specification defines it as an empty element.
- A hyperlink reference (URL) to be used as the base for all relative URL’s in a document.
- The default target for all links on a page. The values of the target attribute can be:
- The same browser window as the current HTML page. This is the default.
- A new window or tab in the browser.
- The parent of an
<iframe>element. (It could have also been the name of a frame in a frameset, but framesets are deprecated.)
- The topmost browser window. Unless the HTML page is in an
<iframe>, it is the same as
- Frame ID
- The ID of an
After all of these header tags, you’re finally ready to present the body of the HTML to the user. This is done using the
A Note About Character Encoding
A character encoding is a mapping of machine-readable numbers to human-readable characters. A character set is the set of characters that can be represented by a specific encoding. These may be characters from English, Arabic, Chinese, Cyrillic, or any other alphabet (depending upon the encoding). Thus, the character encoding determines which set of languages can be successfully represented by a machine. In the context of an HTML page, the “machine” is the web browser.
Character sets have been growing as the Internet evolved to cover more of the world. The first websites used ASCII, which is incredibly limited, so other character sets were adopted almost immediately. From HTML 2.0 to HTML 4.01, the default character encoding was ISO-8859-1, and it is still occasionally used. However, the default character encoding used in XHTML and HTML5 is UTF-8. If you’re creating a web page today, you should use UTF-8 unless you have a specific reason not to.
As you can see above, the character encoding can be specified in multiple places in the HTML header. In fact, it can also be specified in the HTTP header, sent by the web server before the HTML page is transmitted. If there are conflicts, which one is used? According to the W3C, the browser will determine the character encoding in this order:
- The HTTP
- The XML declaration (if the document is XHTML)
<meta>tag, using the
charsetattribute (or if HTML 4.01, the
- The default character set is assumed (usually UTF-8)
If the above doesn’t work (i.e. you’re using an encoding other than UTF-8, but don’t specify what it is), then the browser will do whatever the hell it wants. This is almost certainly not what you want, so unless you’re using UTF-8, you should always specify the character encoding.
If you’re writing HTML, you’re most likely writing the same version of HTML, with roughly the same data in all of the headers. So, just for you, I’ve created a couple of standard HTML templates. Just fill in your information, and put whatever you want into the body.
- HTML5 Template
- <!DOCTYPE html> <html> <head> <title>YOUR_TITLE_HERE</title> <meta charset="UTF-8" /> <meta name="author" content="YOUR_NAME_HERE" /> <meta name="description" content="YOUR_DESCRIPTION_HERE" /> </head> <body> <!-- Content of web page --> </body> </html>
- XHTML 1.0 Template
- <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>YOUR_TITLE_HERE</title> <meta name="author" content="YOUR_NAME_HERE" /> <meta name="description" content="YOUR_DESCRIPTION_HERE" /> </head> <body> <!-- Content of web page --> </body> </html>
HTML framesets allow multiple HTML documents to be displayed in the same browser window. The frameset specifies a number of resizable frames, each of which has an associated HTML document, and are displayed on different sections of the screen.
Before we go any further: framesets are bad. They were officially retired in HTML5, but they were discouraged in XHTML 1.0, and have not been used on websites since the 1990’s. There are many good reasons for this; if you want detalis, read Jakob Nielsen’s 1996 article, Why Frames Suck (Most of the Time). (I personally would have left off the part in the parentheses.) If you really need to embed one HTML page in another, you can do that using the
<iframe> tag. I will talk about that when we get to the section on including media.
Unfortunately, some people may still encounter framesets in specific situations. For example, code documentation generators (like Doxygen or Javadoc) may still output HTML with framesets.
If you’re not one of those people, then you should stop reading now, and skip ahead to the next section. The sooner people stop even thinking about framesets, the better.
- A website that uses frames must specify that it is a frameset, and not an HTML page. Here are the DOCTYPE tags for HTML 4.01 and XHTML 1.0 (which, again, are only for reference):<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
- This tag encloses the entire frameset, in the same way that the
<html>tag encloses an entire HTML document.
- Specifies the size of each column in a frameset (and, thus, the number of columns).
- Specifies the size of each row in a frameset (and, thus, the number of rows).
Specifying one of these attributes is required. If the number of framesets do not match the number of rows times the number of columns, then either the latter frames won’t be rendered, or the frame will be rendered with a blank HTML page (depending on the mismatch).
- The frame tag specifies the data about a specific frame. It is an empty tag, so it should be terminated with a slash before the closing angle bracket.
- Specifies the name (that is, ID) of the frame. If the links of one frame want to target another frame, they would use this name. You could also use the standard, non-deprecated
idattribute for this, but if you’re using framesets, you’re using deprecated HTML already.
- Specifies the URL of the HTML file to be displayed in the frame.
- Prevents the frame from being resized by the user. The value must be
- Determines whether the frame will have scrollbars. Values can be
There are more attributes, but they were not widely used even before framesets were deprecated, so I won’t go into them here.
- The contents of this tag would be displayed to the user in the event that their browser couldn’t handle frames. When frames were around, it would usually be a “helpful” note to the users telling them to update their browser. You can probably guess how that went over.