A Guide to HTML

Structured Text: Lists and Tables

In this section, I’ll go over the tags used to create structured text. By “structured text,” I mean a structure that encapsulates many records of the same type, where each record holds textual data. It is analogous to the data structures used in computer programming (like arrays, structs, or lists).

Basically, it’s a term I’m using to talk about HTML lists and tables.

Lists

HTML currently has three types of lists: description lists, ordered lists, and unordered lists. Each list item is a block-level element by default, though this is often overridden with CSS.

<dl>
Description list. The HTML 4.01 specification called this a definition list, and intended for it to be used to create a dictionary or glossary: a set of terms and their definitions. This intention was not widely respected.

The HTML5 standard redefined it as a description list, and allows pretty much any kind of related name/value grouping.

<dt>
Description term. This is the term in a term/description group (or name in a name/value group) from a description list. This term is associated with the data in any <dd> tags following it, until another <dt> tag is encountered. So, one term can be associated with many descriptions.

The HTML5 specification states that there cannot be two <dt> tags with the same content; terms must be unique. This means that a description list can’t be used to mark up conversations. If the term is literally being defined (not merely “described”), you may want to wrap the term’s text in a <dfn> tag as well.

<dd>
Description data. This is the description in a term/description group (or value in a name/value group, or definition in a term/definition group). This description is associated with the terms in any <dt> tags preceding it, until another <dd> tag is encountered. Description data does not have to be unique.

By default, browsers render the description data with a left indentation.

<ul>
Unordered list. This is a list that does not have any kind of order; it is the HTML equivalent of a “bullet-point” list in word processors. This is, in fact, how unordered lists are rendered by default.

One common use of unordered lists is to represent navigation links. If the list contains the primary navigation links, then you can use CSS to render it as a horizontal list.

<ol>
Ordered list. This is the type of list you would use to represent a list that has a ranking, or represents a sequence of steps. Each item in the list will be prepended by a number, which is automatically incremented with each list item.

Attributes:

reversed (HTML5 only)
Specifies that the numbers should be in reverse order. This is a Boolean attribute, so its value should be reversed, and if this attribute is present the numbers will be displayed in reverse order.
start (deprecated)
Starting number. This allows you to start with a number other than one, if e.g. you need to interrupt the list with some other HTML.

This attribute was deprecated in HTML 4.01, but the W3C HTML5 standard revived it.

type (deprecated)
The type of list marker; that is, the type of number. It can have these values:

decimal
Decimal numbers: 1, 2, 3, ...
lower-alpha
Lowercase Latin alphabetic characters: a, b, c, ...
upper-alpha
Uppercase Latin alphabetic characters: A, B, C, ...
lower-roman
Lowercase Roman numerals: i, ii, iii, iv, ...
upper-roman
Uppercase Roman numerals: I, II, III, IV, ...

This attribute was deprecated in HTML 4.01, but the HTML5 specification revived it. Even so, it is recommended that you use CSS to specify the number type.

<li>
List item. This element contains the text of each item in an unordered or ordered list. For description lists, use <dt> and <dd> instead.
<dir> (deprecated)
Directory list. This tag was used to represent a list of files in a directory. It was deprecated in HTML4.01. Use an unordered list instead.

Tables

Tables are used to display information in rows and columns. Each row represents one entry in the table, and each column represents one type of data in an entry. There are usually column headers that name each type of data, and the table in general can have headers, footers, and a caption. Tables are rendered top-to-bottom, so each column is part of a row (and not the other way around). Ideally, each row should have the same number of columns; but if not, rows are filled from left to right.

Tables should be used for tabular data, and for no other purpose. As a rule of thumb, if it isn’t appropriate for a spreadsheet, it shouldn’t be in an HTML table. Specifically, HTML tables should not be used for layout. This is a non-semantic use of tables, and HTML is semantic; page layout is the job of CSS. Additionally, non-visual user agents will choke on tables used for layout; this includes web crawlers and screen readers.

When browsers didn’t support CSS positioning, designers had little choice but to use tables for page layout. This hasn’t been true for well over a decade. You should not use tables to lay out anything: not the sections of a page, not horizontal navigation links, not contact forms. A table may not even be displayed as a table. For example, if your table has a lot of columns, and it needs to be viewed on a mobile phone, each table row could be displayed as a separate list.

<table>
This is the root element of an HTML table. All other table tags should be nested inside this one. Child tags must be table-related tags (the kind in this section), and cannot be text markup tags, page section tags, form tags, etc. (though these can be contained in a table cell).

Attributes:

sortable (HTML5 only)
This attribute signifies that the table should be sortable. In other words, the user should be able to click on a column header, and sort the table entries according to the entries of that column. It is a Boolean attribute, so its value should be sortable, and its presence means that the table can be sorted.

At the moment, this attribute is not supported in any browser, so you’ll need to use JavaScript to sort the table. This will probably change in the near future.

<caption>
This tag defines a caption for the entire table. This is usually some sort of introductory text that gives the meaning of the table data. This could be a short title (like “Quarterly Earnings”), or something more complex, but it is rarely more than a short paragraph.

The <caption> tag cannot have any other table-related tags as children. Also, if it is present, it be inserted immediately after the <table> tag. In other words, it must be the first child element in the entire table.

The HTML 4.01 specification disallowed the use of any block-level tags inside the <caption> tag. This includes the heading tags (<h1> to <h6>). HTML5 removed this requirement.

<colgroup>
This tag is used to define column groupings. It is mainly used to target columns for CSS styling (for example, a “total” column, or a column of row-specific notes). The actual column groups are specified using <col> elements, and this is the only kind of element that can be a child of a <colgroup> element.

The <colgroup> element must be inserted immediately after the <caption> tag, if it is present; and if there is no <caption> tag, it must be inserted immediately after the <table> tag. In other words, it must come before any <tr> tags, or any table section that could possibly contain <tr> tags.

This tag technically accepts the span attribute, but it has no effect in any browser; the number of columns is determined by the number of <col> tags, and their associated span attributes.

<col />
This tag identifies a column, or group of columns, in a table’s column grouping. It must be a child of the <colgroup> element.

Attributes:

span
The number of columns that this column group will span. If this attribute is not specified, it will span one column.

Additionally, the <col> tag is most often used with a class or id attribute, so it can be targeted using CSS.

<thead>
Table header. The header section is used to display column labels (headers). It can contain one or more <tr> elements.

Each table should only have one <thead> element. It must follow any <caption> or <colgroup> tags, and go before any <tbody>, <tfoot>, or <tr> tags.

<tfoot>
Table footer. The footer section is used to display summary information about each column. It can contain one or more <tr> elements.

The HTML 4.01 specification claimed that the <tfoot> element should come before the <tbody> element, so that the browser could start rendering it before all the table’s data had been recieved. The HTML5 specification removed this requirement, but you can still do it if you want to, and you probably should if the table has hundreds of rows.

<tbody>
Table body. This is where the bulk of the table’s data should be. It is possible to put a row of headers in the table body, just as it is in either the header or the footer.

If the table’s content consists of nothing but the body, then using this tag is redundant. This is actually very common. On the other hand, if you use either the <thead> or <tfoot> tags, then you should use the <tbody> tag as well.

<tr>
Table row. This tag defines one row in the table, containing columns. The row may represent an entry in the table, or it may be a row of column headers. If it is an entry in the table, use <td> tags; if it is a row of column headers, use the <th> tag. A <tr> tag may not have any other child elements. You can mix <th> and <tr> tags in the same row (to create marginal distribution tables, for example).

A <tr> tag can be the child of any table section element, if they are present (<thead>, <tfoot>, or <tbody>). If none of these elements are present, it can be a child of the <table> element itself, and this is actually pretty common. However, all <tr> tags must go after the <caption> and <colgroup> elements, if those are present.

<th>
Table header cell. This tag defines a cell in the table that contains a header (and not content). The tag is usually used to display the type of data for a column, at the top of the column list. It must be a child of a <tr> element. You can put a row of headers in any or all sections of the table, as appropriate.

Attributes:

abbr (uncommon)
An abbreviation, or shortened version, of the header text. It does not affect the table’s visual appearance, but it may be used by screen readers.
colspan
The number of columns that this header cell should span. This attribute is usually used to create “two-tier” headers – two rows of headers, where the top row is general (like “Name”), and the lower row is more specific (like “First Name” and “Last Name”).
rowspan
The number of rows that this header cell should span. This can be used to create “two-tier” headers, as with colspan, except the “tiers” would be columns, and the headers would be row headers. Using “two-tier” row headers is less common (since the header text usually needs to be rotated), but it is still done.
scope (uncommon)
This defines the scope of the header cell – that is, whether it is a row or column header. The value may be one of col (the default), colgroup, row, or rowgroup. It does not affect appearance, and is chiefly used by screen readers.
sorted (uncommon, HTML5 only)
This attribute specifies how the column is to be sorted when its header is clicked on. If specified, the value is usually reversed. Table sorting is not supported in any browser, so you will need to use JavaScript.
headers (uncommon, HTML5 only)
This attribute specifies which headers this cell should be associated with. Its value is a comma-separated list containing the ID’s of one or more <th> cells. It is not used by web browsers, and exists mainly for screen readers.

The HTML 4.01 specification only allowed this attribute on the <td> tag. The HTML5 specification allows it on <th> tags as well. In this context, the header attribute is usually used to associate “two-tier” headers. For example, you could associate the <th> tags for “First Name” and “Last Name” (in the second row), with the <th> tag for “Name” (in the first row, spanning both columns).

<td>
Table data cell. This tag defines a cell in the table that contains content (and not a header). It must be a child of a <tr> element.
Attributes:

colspan
The number of columns that this data cell should span. If your HTML table is actually displaying tabular data, you shouldn’t need this attribute, but sometimes it’s necessary.
rowspan
The number of rows that this data cell should span. Like colspan, you shouldn’t need to use this attribute unless your data is wonky.
headers (uncommon)
This attribute specifies which header this data should be associated with. Its value is a comma-separated list containing the ID’s of one or more <th> cells. Once again, it does not affect the table’s visual appearance, and is used mainly by screen readers.

Deprecated Table Attributes

Most table-related tags used to have a common set of attributes. These attributes are deprecated because they all deal with presentation, and you should use CSS for that. I’m including them here because you’ll probably still encounter them in old, or bad, HTML code.

Of course, using global attributes (such as class or id) is perfectly fine.

align
Alignment. Possible values were left, right, or center. Tags that held cell data could also accept justify (which Internet Explorer rendered as center), or a character to align upon (like the decimal point in currency).
border
This is used to specify the thickness of the border around each cell, or around the table as a whole. It was often set to zero (which is non-standard).
cellpadding (<table> only)
This attribute specified the space between the border of the cells and their contents.
cellspacing (<table> only)
This attribute specified the space between the cells themselves, and between the cells and the table borders.
valign
Vertical alignment. Possible values were top, bottom, middle, or baseline.
width
The width of the element. This could be expressed in units (e.g. pixels), or as a percentage.
Advertisements

About Karl

I live in the Boston area, and am currently studying for a BS in Computer Science at UMass Boston. I graduated with honors with an AS in Computer Science (Transfer Option) from BHCC, acquiring a certificate in OOP along the way. I also perform experimental electronic music as Karlheinz.
This entry was posted in HTML and tagged , . Bookmark the permalink.

4 Responses to A Guide to HTML

  1. Ben says:

    You probably know this, but the self-closing-ness of your tags are likely to be totally ignored by the browser unless you set the mime type to xhtml! For example, in most browsers this
    This is a paragraph
    will render exactly the same as this:
    This is a paragraph

    A common gotcha is thinking that you can use a self-closing script tag, e.g.

    instead of

    More on this on stackoverflow:
    http://stackoverflow.com/questions/69913/why-dont-self-closing-script-tags-work

    So there can be arguments for doing this for style or tool support, but the browser really doesn’t care.

    • Ben says:

      wow, looks like wordpress doesn’t escape tags. Sorry, but hopefully the stackoverflow article explains well enough

      • Karl says:

        Also, about the tags – nope, WordPress does not excape HTML; you can use it to mark up comments (as I did just now), so it really can’t do that. For HTML tags, you have to use the &lt; and &gt; escape sequences. I also had to do this when I wrote the article, so I know how much of a PITA it is.

    • Karl says:

      Ben: First of all, thanks for taking the time to read the article. I need all the help I can get…

      The “self-closingness” issue applies only to tags that do not represent empty elements, and that includes the <script> tag. It’s supposed to contain text data (the actual JavaScript code). Trying to make these tags self-closing is not valid XHTML, and browsers will consider it “tag soup.”

      If the tag is actually self-closing – like the <br /> or <input /> tags – then the XHTML standard demands that they be properly terminated, or they won’t validate. The HTML standard (even HTML5) does not, but since XHTML is the one that has been used for years, I think it’s better to include the terminating slash.

      However, the Stack Overflow post did show something else that I wasn’t aware of: the <p> tag cannot contain other block-level tags, like <div>. (Inline tags are fine.) If you try to do this, the browser will consider it “tag soup” and automatically treat it as a tag with empty content. In other words, <p><div>Hello, world!</div></p> will turn into <p></p><div>Hello, world!</div><p></p>.

      I’ll update the article with this info. So, thank you for pointing this out. Please let me know if you find anything else in the article that needs work.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s