A Guide to HTML

HTML Forms

The HTML elements we’ve talked about so far are fairly passive. The only way the user can request a page from the server is through the <a> tag, which can only go to the single URL specified in its href attribute. This is practically useless for dynamic, data-driven websites. It should be possible to request specific information from the server (like search results), and to create or alter information on the server (like a user’s shipping address).

This is done through HTML forms. An HTML form is a group of fields and their associated labels, including a way for the user to submit the form (usually a button). The fields are almost always interactive, and can be presented in a number of different ways (text fields, drop-down lists, check boxes, and so forth). The HTML specification calls form fields and submit buttons form controls. I will go over the tags for specific form controls in a moment. There can be more than one form on a page, but this is usually a bad idea, because it is not user-friendly. It is not possible to nest forms.

When the user submits a form, the data from the form must be sent to the web server. This is done using the HTTP protocol (or its secure counterpart, HTTPS). The transmission itself is an HTTP request, because the user is requesting that the server do something with the data that is sent to it. (The server, of course, is not obligated to do anything with this data.)

HTTP Methods

The HTTP protocol specifies a number of different request methods. There are actually several different request methods, but the two that can be used with HTML forms are the GET and POST methods. (If you’re curious, the other methods are CONNECT, DELETE, HEAD, OPTIONS, PATCH, PUT, and TRACE.)

GET
This method is used purely to retrieve data. It should be used if the request is more like a “question” for the server (like a query or lookup).

When the GET method is used, the information in the form is converted to a query string. To recap, this is information that is appended to the URL of the form processor. The query string starts with a question mark, followed by a set of name/value pairs associated using an equals sign; the pairs are separated by an ampersand (“and”) character. The ID of each form control becomes a name in the name/value pair, and the value of that form control becomes the value of that pair. For example, let’s say your form has text fields with the ID’s firstname and lastname, representing the first and last names. John Doe fills out the form, and submits it. That would result in the query string ?firstname=John&lastname=Doe.

Only ASCII characters can be used in a URL, so the form’s data is URL encoded. Space characters are converted to plus signs (+), and all special characters are converted to their ASCII equivalents. This is done by preceding the special character with a pound sign (%), followed by two hexadecimal digits representing the ASCII character number. For this reason, this type of encoding is also called percent encoding. For the purpose of a URL, “special characters” include all HTML special characters, and the plus sign and pound sign themselves. It should also be noted that different browsers handle non-ASCII letters (like “á”) in vastly different ways.

Since the request is simply another URL, it can be bookmarked in browsers. (But if this is your intent, it is better to rewrite the URL if you can.) It may also be cached for better performance, either in the browser or on the server.

However, there are a number of disadvantages to putting form data in the query string:

  • It is not secure. The query string is visible to anyone who can view the URL (even when sent using HTTPS).
  • You can only use ASCII text in a URL, so you can’t send binary files, like a picture or PDF.
  • The length of a URL is limited, so it cannot be used to send large amounts of text (like an entire blog post).

For these reasons, the GET request method should only be used with short, non-private text data. A good example would be a search form.

POST
This method is used for any form data that is intended to be stored on the server (even temporarily). It should be used if the user’s data could change anything on the server (so is more like a “command” than a request).

When the POST method is used, the form data is appended to the body of the HTTP request. No query string is used, and the URL is not affected. This makes it more secure: the data can’t be read from the URL, and if you’re using HTTPS, the data will be encrypted along with the rest of the HTTP body. Additionally, the HTTP body can accept binary data, and there are no size restrictions. However, since the data is not part of a URL, the form results cannot be bookmarked. They are also not cached in the browser, even temporarily, so you can’t use the broser’s “back” button to return to previous form results.

The POST method should be used for any forms that include private information, binary data, or large amounts of text. Most of the forms you will create will likely use the POST method.

Not all forms are used to send data to a web server. For example, if you’re writing a client-side JavaScript application, the form could be used only to gather data for that application. This is much less common, but not unheard of, and it is a use case that you should be aware of.

Common Form Control Attributes

An HTML form control can be presented to users in many different ways. However, they are all form controls, and have quite a few attributes in common. Many of these have to do with accessibility.

disabled (uncommon)
Disable the form control. Form fields that are disabled cannot be interacted with, and are usually “grayed out” in some way. This is a Boolean attribute.

Usually, JavaScript is used to enable the form field when a user enters valid data in some other form field. For example, a ZIP code field may be disabled unless the user selects “United States” in a country field. You can’t do this sort of thing without JavaScript, and some users have JavaScript disabled. So, it is more common to leave the field enabled in HTML, then disable it using JavaScript when the page first loads, and enable it again when the other field is validated.

name
At the beginning of the article, I included the name attribute in the list of deprecated attributes. And for the most part, this is true. Form controls are the exception. When used with form controls, it allows the server software to reference the data after the form is submitted. For example, if you’re using the GET method, it will specify the name in a name/value pair for a URL query string.

In most circumstances, you can also use the id attribute as the name, and it will work just as well. However, ID’s must be unique, so this won’t work on form controls that share the same name (like radio buttons).

JavaScript can also target form elements by name. It may still be better to use the id attribute for this, since names are not unique.

readonly (uncommon)
Specifies that the form control is read-only. It is a Boolean attribute.

Usually, the field will be used to display some kind of text to the user, which the user is not allowed to change. However, the content is not “greyed out” like it would be in a disabled form control, and the user can highlight the text (for copying and pasting). JavaScript is sometimes used to make the field editable once another field is validated. It is better to use something other than a form control for diplaying text, and it is better to use disabled for controls that JavaScript will enable later. For these reasons, read-only form fields are not very common.

tabindex (uncommon)
Sepcifies the tab index for the form control. The value must be an integer. The first index is 1, not 0 (unlike most programming languages, which are zero-indexed).

People who fill out forms can use the tab key to skip to different form fields, without using the mouse. (The enter or return key is not suitable for this, since it usually submits the form.) The tabindex attribute can be used to specify an order to the fields, so that you can control which field gains focus when the user hits the tab key. If not specified, the order will be determined by where the tag appears in the HTML file; the first tag will be at tab index 1, the second will be at tab index 2, and so on. This is usually what users expect, so the attribute is not used very often.

The HTML5 specification allows you to use this attribute on all elements. The HTML 4.01 specification (thus XHTML) only allows it on elements which the user can interact with. This includes all form controls, but also the <a>, <area>, and <object> elements. I’ve never encountered anyone who used the tabindex attribute on these elements, but you never know.

Form Event Attributes (Discouraged)

Like form-specific attributes, there are a number of JavaScript events which are common to form controls, but not other HTML elements. These events are commonly used to make sure the form’s data is valid before it is sent to the server. But, keep in mind that the user may have JavaScript turned off. As an aside: any user agent can send an HTTP request to the server, so servers shouldn’t assume any form data is valid. But this is an issue for server-side processing, so it’s outside the scope of this article.

Remember, however, that using event attributes in HTML is discouraged. They should be used by JavaScript only. Still, it is a good idea to at least know what they are, so I’m including them here for information.

onblur
This event is fired when a form control loses focus; that is, when the user has tabbed away from the form field, or clicked on a different field. Presumably, this happens after the user has entered or selected data.
onchange
This event is fired when the value of a form control is changed. Note that if the user does not enter anything, or keeps the control at its default value, then this event is not fired.
onfocus
This event is fired when the form control gains focus. That is, the user has just tabbed to the form control, or selected it using the mouse.
onselect
This event is fired if the user has selected some text in the form control. Obviously, this event is only applicable to form controls that actually have text input (and not, say, buttons).
onsubmit
This event is fired when the form is actually submitted. This event is often used for last-minute validation. Or, if the form is used by a client-side JavaScript application, this event could be used to store the data somehow.

The HTML5 standard created other event attributes relating to form controls. But using them (and dealing with browser compatibility) is the job of a JavaScript programmer, so I won’t talk about them here.

Form Tags

<form>
Defines an HTML form. This is the root element of a form; the form control tags should be nested inside it. Any form control tags that are not in the content of a <form> element, are not associated with any form at all. This is valid HTML, but it is usually not very useful, unless the controls are being used solely for JavaScript applications.

The contents of the <form> tag do not have to be form control tags. You can (and probably should) include other text marked up with HTML. However, HTML5 sectioning tags (like <nav> or <section>) can’t be used inside a <form> tag. Also, you can’t nest HTML forms, so you can’t have a <form> tag in the content of another <form> tag.

Attributes:

accept-charset (uncommon)
Specifies the character encodings that are used in the form. This is a space-separated list of character encodings, such as ISO-8859-1 or UTF-8. It can also be the special value UNKNOWN, which means that the character encoding is the same as the one used in the HTML document. This is the default. It is also what you usually want, so there are not many cases where you need to use this attribute.
action
This attribute specifies what will happen when the form is submitted. Its value will be a script that takes the form data, and processes it in some way; such a script is called a form processor or a form handler. It is almost always a URL to a dynamic resource on the server, and can be a fully-qualified URL or a relative path.

A form processor does not always have to be a script on the server. If you’re writing a client-side application, then you will want the form to be processed by JavaScript. In this case, common tactic is to use an empty fragment (just the pound sign) for the action attribute. That way, the form’s data won’t “go” anywhere. On the other hand, the action attribute can accept any URL, so you could possibly use a mailto: URL (though this is a bad idea).

A single form may have more than one submit button. In this case, the form will go to the same form processor, but only the submit button that is pressed will have its value sent to the form processor. The HTML5 standard created attributes that let you send the data to different form processors. These are attributes of the <input> tag, so I’ll go over these attributes when I talk about that tag.

enctype
Encoding type. This attribute is only recognized when using the HTTP POST method. It tells the browser how to encode the form data before it is sent to the server. Common values are:

application/x-www-form-urlencoded
This is the default encoding type. When this encoding type is used, all data is URL-encoded before being sent, just as it would be in a query string. It is the best option when any of the form’s fields could contain HTML, and none are sending binary data.
multipart/form-data
This value specifies that the HTTP body consists of data in multiple parts. When this encoding type is used, the form’s data is not encoded in any way. If you are using the form to send binary data (like an image), then you need to use this encoding type.
text/plain
When this encoding type is used, spaces are converted to + characters, but special characters are not converted. It is the best option when none of the form’s fields could even possibly contain HTML, or send binary data.
method
The HTTP method used to submit the form. The only values that are accepted are get or post. These values are not case-sensitive, but it is common practice to use lower case.
target (uncommon)
The target of the form’s response page. After a form is submitted, the form processor should return some sort of HTML page, so that the user knows the form was submitted successfully. This attribute says where that page should be displayed. It can accept any of the usual link targets: _blank, _parent, _self, _top, or an <iframe> ID. It is not very user-friendly to display the response page in a different target, so this attribute is rarely used.
<input />
A form input. Most types of form inputs are defined using this tag, and it is the one you will use the most when creating forms. It is an empty tag, and should be properly terminated.

Attributes:
Note: several attributes of the <input> tag are only recognized on certain types of inputs. I will list those attributes when I cover the relevant input types.

value
This is the value of the form field that is sent to the server. If the input is a text type, this is the default value, and may be changed by the user. If it is not a text input, then it cannot be changed by the user. In these cases, you should consider the attribute required; otherwise, the input will not have any value associated with it.
type
The type of form input. This determines the input’s appearance, and how the user interacts with it.

Input Types:

text
A text input. This is a single line of text input; if you want to accept multiple lines of text, use the <textarea> tag instead.

This is the default type if none is specified (which is why I’m listing it first). It is also the default type that is rendered if the type attribute is specified, but the browser doesn’t recognize its value. This is particularly useful for the new HTML5 input types; I’ll cover those in the section on HTML5 forms.

Related <input> Attributes:

maxlength
The maximum length of the input. This is the maximum number of characters that the field will accept, which is usually larger than its visible length.
size
The visible length of the field. If the user reaches the end of the field, they can still enter data (up to maxlength). The text in the field will scroll so that the cursor is always visible.
button
A clickable button. This is usually used to trigger some JavaScript action. If your form control is to be used for submitting the form, you should use the submit or img types instead.

There is also a <button> element, but it has a lot of issues in older versions of Internet Explorer. It is usually better to use <input type="button" /> in an HTML form. I will go over the issues when I talk about the <button> element, below.

checkbox
A checkbox. Checkboxes can be either checked or unchecked, and if it is not checked, its value is not sent to the server. Multiple checkboxes can be checked at once (unlike a radio input). By default, the browser renders checkboxes in a square shape, and radio buttons in a round shape.

The value attribute is the value sent to the form processor when the checkbox is checked. It is not displayed to the user. To let the user know what they’re checking off, you should use the <label> tag, or some other text.

Related <input> Attributes:
Note: You should also specify the value attribute.

checked
Specifies that the checkbox should be checked by default (when the form first loads). This is a Boolean attribute. If you do not use this attribute, then the button will not be checked by default.
file
A file upload field. By default, this is rendered as a short text field, with a “Browse” button next to it. Selecting the file doesn’t actually upload it to the server; this is done when the form is submitted. If you are using this input type, then you should submit the form using the HTTP POST method.

Related <input> Attributes:

accept
Specifies the file type(s) that the input should accept. If you specify this value, then other file types will be hidden when the user browses for a file to upload. It can be one of these values:

audio/*
All audio file types: .mp3, .ogg, .wav, etc.
image/*
All image file types: .jpeg, .gif, .png, etc.
video/*
All video file types: .webm, .mp4, .ogv, etc.
A file extension
If you only want to accept files with a specific extension, you can use that extension as the value (include the leading period).
A MIME type
If you only want to accept files with a specific MIME type (e.g. video/ogg), you can use that MIME type as the value. Note that you can not specify the codec, like you can with a <video> tag.
hidden
A hidden form field. This can be used to hold data that is sent to the server, but is not visible to the user. When the form is submitted, the data is sent exactly as if it were a visible text field.

Hidden fields are almost always placed in the form by server-side software, and not written into a static HTML form. When used, they usually hold persistent data (like a product ID) when filling out forms with multiple steps. They have the advantage of working even when the user has cookies or JavaScript disabled. Of course, the field is still visible in the page source, and it will still be visible in the URL if you use the GET method. In other words, this field should never hold any kind of personal information.

You should always specify the value attribute. Otherwise, the hidden field will have no value, and it’s pointless to use it in the first place.

image
An image submit input. This input type is used when you want to display an image for the submit button. Clicking on the image will submit the form. (It is not possible to use an image for any other type of form control, but you can use CSS to achieve the same effect.)

Like the submit input type, there can be more than one image input in the form. You distinguish between them by specifying different name and value attributes for each <input> tag. Unlike the submit type, the text of the value attribute is not displayed to the user.

Related <input> Attributes:
Note: Because the image type is used to submit the form, it shares all of the HTML5 attributes of the submit input type. I’ll go over those when I cover HTML5 attributes.

src
A URL to the image resource. This may be a full-qualified URL, or a relative path.
alt
Alternate text for the image. This should be a short description, used when the user agent does not display images.
password
A password field. The characters in this field are masked by default; the browser will show some non-descript character (like a “bullet” character) instead of what the user actually types. This is done for security reasons.

If your form is asking for a password, then you should use the HTTP POST method, so the password won’t show up in the URL.

A password field accepts text input, so it can also use the maxlength and size attributes.

radio
A radio button. This is similar to a checkbox, but only one radio button in a group may be active at a time. Radio buttons are usually rendered by the browser as circles, while checkboxes are rendered as squares.

To make radio buttons part of the same group, you should give them the same name attribute. (This is one of the few times where the id attribute can’t be used, since ID’s must be unique.)

The value that is sent to the form processor will be the string specified in this tag’s value attribute. It is not displayed to the user. You should use the <label> tag, or some other text, to let them know what they’re choosing.

Related <input> Attributes:
Note: Remember to specify the value attribute as well.

checked
Specifies that the radio button should be checked by default. It is a Boolean attribute. If you do not use this attribute on any radio button, none will be checked by default. Only one radio button per group should have this attribute; if you use it on more than one, the behavior is undefined (and invalid HTML). Most browsers will default-check the last radio button in the group that has this attribute.
reset (discouraged)
A field that will reset the form when clicked. This is rendered as a button by the browser. When that button is pressed, all of the fields in the form will be reset to their default values; text fields will usually be empty.

This is not a good idea. If the user hits this button by mistake, everything they’ve entered will be wiped out. This will almost certainly piss them off. In fact, most users expect the data to be persistent even if the form fails validation. I would avoid reset buttons like the plague.

submit
A field that will submit the form when clicked. This is rendered as a button by the browser, and the value attribute will be rendered as the text inside the button. If no value is specified, the browser will supply a default (e.g. “Submit” in Chrome, “Submit Query” in Internet Explorer).

There may be more than one submit button in a form. To distinguish between them, you specify different values for the name and value attributes. Whichever submit button is hit will have its name/value pair sent to the server; the others will not. The name attribute must be specified on every submit button, or this won’t work; obviously, it would be better for the user if you supply different value attributes as well.

The HTML5 specification allows you to use different submit buttons to send the form to different form processors. I’ll go over that in the section on HTML5 form elements and attributes.

<label>
A label, or caption, for a form control. The contents of this tag are displayed to the user, and associated with a form control (usually an <input> element). If you click on a label, most browsers will behave as if you clicked on its associated form control.

The form control may itself be nested in the contents of the <label> element, but it does not have to be. If it is, then the form control is automatically associated with its parent <label> element. If not, then you should use the for attribute to explicitly associate the two. There are some advantages to not nesting the form controls – for example, you can set all of the <input> widths using CSS, to line up any text fields to their right.

The label does not need to be next to its associated form control; all the labels could be at the top of the page, for all the browser cares. But users expect the label to be next to the form control, and you should abide by their expectations. By consensus, labels should be to the right of checkboxes or radio buttons; to the left of short text-input fields; and above everything else.

Note that you cannot associate a label with a hidden input type (which would make no sense anyway). You also cannot associate a label with a <fieldset> element; use the <legend> tag instead.

Attributes:

for
The ID of the associated form control.
<button> (uncommon)
A clickable button. You probably recall that the <input> tag also has a button type, but there are differences between the two. The main difference is that <button> is not an empty element, so it can have other HTML elements inside it, including <img> elements. The HTML between the start and end tags will be rendered inside the button itself. It may also have a value attribute, and in theory, this is what will be sent to the server when the form is submitted.

I say “in theory,” because the <button> element has some issues. In older versions of Internet Explorer, the contents of the <button> tag will be used as the value (the value attribute is ignored), and those contents will always be sent when the form is submitted. For example, if you use an element in the contents, that image will be sent as part of the value. If you’re not using multipart/form-data encoding, this will likely cause your form processor to crash. Additionally, the fact that the button’s value is always sent will make it impossible for the server to know which button was actually pressed. This means you can’t use multiple <button> elements for submit buttons. Thankfully, this but was fixed in IE 8, but lots of people still use IE 7 or below.

For this reason, the <button> element is not widely used. You should be able to handle any markup inside the <button> tag by using CSS. If not, it’s probably too complex for a button anyway.

Attributes:

type
The type of button. This is an enumerated attribute, which can accept these values:

button
A button that, by itself, does nothing. (The Internet Explorer documentation calls this a “command” button.) It should only be used to trigger a JavaScript command.
reset (discouraged)
The button should be used to reset the form. For reasons I explained earlier, resetting the form is a bad idea.
submit
The button should be used to submit the form.
<textarea>
Defines a multi-line text input field. By default, the text inside is rendered in a fixed-width font.

Attributes:

cols
The number of columns in the text area. In other words, the width of the text area, in characters. The default value is 20.
rows
The number of rows in the text area. So, the height of the text area, in characters. The default value is 2.
<select>
This element creates a selection list. If only one option is visible, this will be displayed as a drop-down list. If more than one option is visible, the browser will display a scrollable selection list.

This tag is the root element (the “wrapper” tag) of the list itself; it does not specify any options. The list should be populated with <option> tags, optionally grouped together with the <optgroup> tag. These should be nested inside the <select> tag. No other elements can be children of the <select> element.

The user can interact with the selection list using the keyboard. If the user types a key, then they will jump to the first option in the list whose value starts with that key. This makes it easy to navigate long lists; you’ve probably used it yourself when entering a country using a list.

The name attribute of this element will be the name in the name/value pair sent to the form processor. Surprisingly, the id attribute cannot be used instead of name. The value attribute of this tag is ignored. Instead, the value that is sent will be the value attribute of the selected <option> element. If multiple options are selected, then multiple name/value pairs will be submitted, all with the same name.

Attributes:

multiple
Whether or not multiple options can be selected. It is a Boolean attribute. To select multiple continuous options, users can use the shift key. To select multiple non-continuous options, users can use the ctrl key on Windows and Linux computers, or the command key () on a Mac.
size
The number of options to show in the list. If the multiple attribute is present, the default is 4; otherwise, the default is 1.
<option>
Specifies an option for an input list. This can be either a <select> element or an HTML5 <datalist> element (see below), and options may be grouped together using the <optgroup> element. The <option> tag cannot be the child of any other element.

Unfortunately, this tag behaves differently in a <datalist>. For now, I will assume that it is a child of the <select> element, since that’s how it has been used for decades. I will discuss its other, wonkier behavior when I talk about the <datalist> element.

The text that is displayed from the user can come from two places. If the label attribute is specified, then its value is displayed in the list. If it’s not specified, then the content of the <option> tag will be displayed. Any HTML markup in its content is completely ignored. Whatever text is displayed is the text that the user interacts with using the keyboard.

When submitting a name/value pair to the form processor, the name attribute of this element is ignored. (Of course, it may still be used with CSS or JavaScript.) Instead, it comes from the name attribute of the root <select> or <datalist> element.

The submitted value can also come from two places. If the value attribute is specified, then that will be the text that is submitted to the form processor. If not, it will be the text between the opening and closing tags. (Neither of these things is necessarily what the user sees.)

Attributes:
Note: You may want to also specify the value attribute.

label
Specifies the visible label for the option.
selected
Specifies that the option should be selected by default (when the form first loads). It is a Boolean attribute. If multiple items may be selected in the list, then multiple options may be selected. If not, and multiple options have this attribute, then most browsers will default-select the last option with this attribute.
<optgroup>
An option group. This is used to group <option> elements together in a <select> list. The <option> tags are nested in the <optgroup> tag. The <optgroup> element can contain no other elements, and its parent must be a <select> element.

Attributes:
Note: Like other form controls, this takes the disabled attribute, and you can use it to disable an entire group of options.

label
Specifies the visible label for the group of options. The label is usually rendered in bold text, and left-indented from the options. If no label is specified, there will be a blank line instead. The label cannot be interacted with in any way; for example, you can’t jump to a label by using the keyboard, like you can with an option.
<fieldset>
This element allows you to group form fields together as a set. The <fieldset> element can contain any HTML markup in its contents, but usually these will be form controls and their associated labels. You can also define a legend for the fieldset using the <legend> tag. By default, the browser will draw a border around all the elements inside the fieldset, with the legend as a title or label.

The fieldset is only used to group elements; it does not provide any value when the form is submitted. Disabling the fieldset will disable all of the form controls inside it.

It is very common to use the <fieldset> tag to make a section of the form collapsible. The user could click on the legend, and the fieldset would hide its content (or show it again if hidden). This can only be done using JavaScript. If you’re creating collapsible sections in HTML5, then you might consider using the <details> tag instead. That tag is experimental, and is not supported in Firefox or Internet Explorer. But you can add support using a JavaScript library, and you would need to use JavaScript anyway.

<legend>
The legend for a fieldset. The content of the <legend> element will be displayed as the legend. You can use any HTML markup tags in the content, but since the legend is displayed as a label, block-level tags are probably a bad idea.

A fieldset only has one legend. If more than one <legend> tag is present, then the first one encountered is used.

HTML5 Forms

When HTML 4.01 was standardized, the idea of a “form” was limited to our experience with paper forms. By today’s standards, these are very basic affairs: tax form, DMV records, multiple-choice tests, and so forth, all filled in with a trusty No. 2 pencil. Modern computers, on the other hand, are GUI powerhouses, with high-resolution screens and more advanced input methods (including touch screens on mobile devices). As computer software matured, there arose a huge number of input types: color pickers, date pickers, sliders, and so forth. These things are now second nature to most users.

In the world of graphical software programming, these components are called widgets. Many graphical software packages (like Swing, Qt, or Cocoa/GNUstep) allow programmers to create widgets and define their behavior. However, in the world of HTML, we are “stuck” using form controls. Many, many websites have used widgets as form controls, but they had to be be created using JavaScript, or possibly even Adobe Flash.

The HTML5 standard vastly improves this situation. It defines a number of new form controls, which are natively rendered by the browser as GUI widgets, without the need for JavaScript (or any other scripting language). They can also be styled using CSS, and can still be targeted by JavaScript when needed. But since the widgets are so widely used, their appearance is fairly standardized, so this often isn’t necessary. HTML5 also defines a number of new attributes, which can make the old form controls more user-friendly.

Around 2006, when HTML5 was still fresh in the minds of the people, the jargon-happy among us started calling HTML5 form features “web forms 2.0.” After a year or two, that term dropped of the face of the Earth. Feel free to quietly laugh at anyone still using it.

Form Control Tags

These are the new tags that the HTML5 standard created. It also created a number of different <input> types; I’ll cover those separately.

It should be noted that most of these elements are not supported by Safari or iOS devices, nor are they supported in Opera Mini. (They are supported in the desktop version of Opera.) You should assume they’re not supported by these user agents; I’ll tell you if they are.

<datalist>
This tag is used to declare a set of pre-defined options for a suggestion list. If a text-based <input> element is using the autocomplete attribute, then a datalist can be used to suggest autocomplete options. In order to use a datalist, the <datalist> tag must have an ID, and the <input> tag must be associated with it, by using the ID as the value of its list attribute.

Like the <select> element, a datalist should contain <option> tags. When the user types characters into the text input field, they are matched against text in the <option> tags, and matches are displayed in a small window below the text input field. (This part of the process is often called autotype.) The user can select one of these options, and the field will be automatically filled with the option’s text. On the other hand, the user may choose none of the options, and still enter whatever they like in the text field. You’ve probably seen this used by search engines.

Browsers that support the <datalist> element ignore anything in its content that is not an <option> element. So, the datalist may also include a <select> element inside it, with the same name as the <input> element. The nested <option> elements will then fill out the selection list. Users with older browsers will see both the text input and selection list, and can choose which one to use. Since both elements have the same name, the form processor receives only one value.

Unfortunately, the <option> tag behaves differently in a datalist than it would in a <select> element.

How the <option> element behaves in a datalist
If the <option> element behaved as if it were in a selection list, this is what would happen. The label attribute’s text is displayed to the user, so it would be the text that is presented in the suggestion list (before the user selects anything). It is also the text that the keyboard interacts with, so it would match what the user types. The value attribute’s text is sent to the form processor, so it would be the text that is automatically filled when the suggestion is chosen. If one or the other attribute is not present, then the content of the <option> tag would be used instead.

The only browser that actually behaves this way is Firefox. The other browsers behave very differently.

If the value attribute is specified, all browsers use its text for auto-completion, so it will always determine the value sent to the form processor. If this attribute is not present, all browsers use the content of the <option> element instead. So far, so good.

The label attribute is where we get into trouble. Only Firefox will ever match the user’s typing against its value. Chrome, Opera, and Internet Explorer match the user’s typing against the value text, or the content of the <option> element if value is not present. That is, they match the typing against the value that will actually be sent to the form processor.

For the text displayed in the suggestion list, both Firefox and Internet Explorer use the text of the label attribute, or the contents of the <option> element if label is not present. Chrome and Opera display the text of the value attribute (if present). This means Internet Explorer may display text in the suggestion list that has no relation to the characters that the user actually typed. Chrome and Opera may be wrong, but at least they’re internally consistent.

If neither attribute is used, all browsers use the contents of the <option> element for everything. So, here’s my advice: when providing options for a suggestion list, don’t use any attributes on the <option> tag. You’ll save money on headache medicine.

<keygen>
This tag generates a cryptographic key for the form. It is an empty tag, and must be properly terminated. Supporting browsers will render it as a drop-down list, where each entry is a different encryption grade.

This is basically how it works. When the form is submitted, a public/private key pair is created, using the specified encryption algorithm and grade (encryption strength). The private key is stored on the user’s machine, and only the public key is sent to the form processor. The public key is also digtally signed and encoded before it is sent, and may optionally include a challenge string. This information is used by the server to generate a security certificate, which is then sent back to the user agent, so that it knows the response is genuine.

If you’re using this element, then you should also use the HTTPS protocol. The form should definitely be submitted using the POST method; adding a the <keygen> element is useless if the form data is visible in the URL.

This tag was created long ago by Netscape, but it was never part of any previous HTML standard. Perhaps because of this, it is supported by most browsers, including Safari. However, it is not supported by Internet Explorer, and it never will be.

Attributes:
Note: you can also use the hidden attribute to hide the drop-down list, but there is no way to specify the grade.

challenge
The optional challenge string. If none is specified, an empty string is used.
keytype
The key type, or encryption algorithm, that is used to generate the key/value pair. Common values are RSA, DCA, or EC, but they’re not guaranteed to work in all browsers. However, the default is RSA, which is implemented in all supporting browsers. This will be used if the browser doesn’t recognize the value.

If you’re using an algorithm other than RSA, Firefox also requires a keyparams attribute. This attribute is not recognized by any other browser, and its value depends on the algorithm. For details, read the <keygen> documentation at the Mozilla Developer Network.

<meter>
This element represents a meter or gauge. It is an inline element. By default, most browsers will render it as a horizontal bar, without any kind of animation. The contents of the <meter> element can hold any HTML markup, except for another <meter> element. This content will not be displayed by browsers that support the <meter> tag, so it can be used to display fallback content for non-supporting browsers.

As the W3C puts it, a meter displays a scalar measurement within a known range, or a fractional value; for example disk usage, the relevance of a query result, or the fraction of a voting population to have selected a particular candidate. You should not use it to indicate a value that doesn’t have a pre-determined range (like the global population).

Also, you should not use it as a progress bar. The HTML5 standard provides the <progress> tag for that. On the other hand, the <progress> tag is more widely supported than the <meter> tag, so you could use it as a fallback, even though that’s cheating.

This tag is not supported by Internet Explorer. Safari added support for it in version 5, but as of this writing, it is still not supported on iOS.

Attributes:

high
A numeric value that is considered high (but not the maximum value). If value is higher than this number, then the bar will usually be rendered in a different color (like yellow or red).
low
A numeric value that is considered low (but not the minimum value). If value is lower than this number, then the bar will usually be a different color.
max
The maximum value. If value equals this number, the meter is full.
min
The minimum value. If value equals this number, the meter is empty.
optimum
The optimum (ideal) value for this meter. No browser uses this attribute, but it could be useful if you’re targeting the meter with CSS or JavaScript.
value
The current value of the meter. This attribute is required. If you leave it off, the browser will probably display an empty meter.
<progress>
A progress bar. It is an inline element. By default, browsers will render this as a horizontal bar, and it may or may not be animated in some way. The exact style of the progress bar usually matches the underlying operating system. The contents of the <progress> element can contain any HTML markup (except another <progress> element). This will be hidden by browsers that support the element, and shown by those that do not, so it can be used for fallback text.

A progress bar can also be indeterminate. This means that the task is being performed, but there is no way to know how much progress is being made. In this case, the progress bar will be rendered as a repeating animation of some kind. At the moment, only Internet Explorer and Chrome support indeterminate progress bars; the others render a bar that is either empty or full.

This should only be used to represent the completion of some task. It should not be used as a meter or gauge; use the <meter> tag for that.

Attributes:

max
The maximum progress level. If the value attribute reaches this level, then the task should be finished.
value
The progress made thus far. If a max attribute is present, this should be between zero and that maximum. If not, then this should be a floating-point number representing the percent finished (e.g. 0.5 is 50% done). If the value attribute is not present, then this is an indeterminate progress bar.

Input types

Most widgets exist to provide the user with a convenient way to provide data to an application. So rather than create a slew of new tags, the HTML5 authors turned to the <input> tag. These form controls can be used simply by specifying a new value for the type attribute.

This is a good idea, because it means that the new form controls can be used on browsers that don’t support them. If the value of the type attribute is not recognized, the browser will default to a single-line text input field. This way, they can still enter valid data in the field (though it’s less convenient). It turns out this is necessary, because support for some types is spotty, even in modern browsers.

There are a couple of input types that were considered in various draft standards, but didn’t make it into the HTML5 specification (and should be considered deprecated). They were around for such a short time that no browser supported them. I won’t even list those.

color
A color picker. By default, this is displayed as a button with the current forground color displayed in a square. Clicking on the button will bring up the color picker. The value that is sent to the form processor will be an RGB value (Red, Green, Blue), in lower-case hexadecimal format, preceded by a pound sign. If you’re using the GET method, the pound sign will be URL-encoded as %23. There will always be a color chosen; there is no way to set the value to an empty string.

This type is not supported by Internet Explorer.

date
A date picker. By default, this is displayed as a text input field that accepts values in mm/dd/yy format. Most will also include small up/down arrows at the side of the field, which the user can use to increment or decrement the times. Next to these is a button that lets the user open up a small calendar. Mobile devices usually render date fields using the OS’s date picker widget.

This type is not supported by Internet Explorer, Firefox, or Safari (but it is by iOS).

email
An email address. This is rendered as a text input field like any other. When the user submits the form, the browser will parse it to make sure that it has the form of a valid email address. (Of course, it cannot guarantee that it is a working email address.) Some mobile browsers will also show an “at” character (@) in the keyboard.
number
A number. Supporting browsers will render it like a text input, but also show small up/down arrow keys to increment or decrement the number. Some mobile browsers will also show a number pad instead of a keyboard. The browser will validate the input when the form is submitted.

This must be a decimal number; numbers in other bases (e.g. hexadecimal) will not work. If the values of all the attributes are integers, then the input type will not accept floating-point numbers. However, even a single floating-point number will change that.

Related <input> Attributes:

min
The minimum value for the input. This attribute is optional; the number does not have to be in any particular range.
max
The maximum value for the input. This attribute is optional.
step
The step size. Valid numbers start at the minimum value (or 0 if it is not present), and increment by this amount. Other numbers will not validate. For example, you could use a step size of 2 to make sure the user enters an even number.
value
The default value. It must be a number above min and below max, if they are present. If not, the form will not validate with the default value.
range
A number in a general range. In the words of the W3C, this form control a control for setting the element’s value to a string representing a number, but with the caveat that the exact value is not important, letting UAs provide a simpler interface[.] In all supporting browsers, this is rendered as a horizontal slider. If you move the slider’s thumb, Internet Explorer displays the value in a tooltip; other browsers do not display it at all.

Like the number type, if all the attribute values are integers, the range type will only accept integers. Most browsers will “jump” the slider’s thumb to the nearest integer value; Internet Explorer will re-position the thumb when you let go of it.

Related <input> Attributes:

min
The minimum value for the range. This will be the leftmost position of the slider. The default value is 0.
max
The maximum value for the range. This will be the rightmost position of the slider. The default value is 100.
step
The step size. The default is 1.
value
The default value. If specified, the slider thumb will be positioned over this value when the form first loads. It must be a number within the range specified by min and max; if not, the browser will use the nearest valid number. If value is omitted, the default value will be the middle of the valid range.
A search field. Browsers will render this like a text input field. However, when the user types something, some browsers will add a small button to the end of the field, marked with an “x.” Clicking on this button will clear the field. Also, on some mobile devices, the keyboard will display “Search” or a magnifying glass icon, rather than the traditional “Go.”

Webkit-based browsers can also use the results attribute (a Boolean attribute). If this is used, a magnifying glass icon will be displayed to the left of the field. They may also use the autosave attribute to save search entries across page loads. Neither of these attributes are part of any HTML5 standard, and are not supported by any browser that is not Webkit-based (like Internet Explorer or Firefox).

tel
A telephone number. This is displayed as a normal text input. Mobile browsers may also display a numeric keypad instead of the keyboard. Because of the wide variety of telephone numbers around the world, no browser even attempts to validate this field. Your only option is to use JavaScript for validation. (Good luck.)
time
A time field. By default, this is displayed as a text input field that accepts values in hh:mm am/pm format. There will also be small up/down arrows at the side of the field, which the user can use to increment or decrement the times. Mobile devices usually render time fields using the OS’s “spinner” widget.

When the user submits the form, the time is converted to 24-hour time in the hh:mm format. If using the GET method, the colon will be URL-encoded as %3A.

This type is not supported by Internet Explorer, Firefox, or Safari (but it is by iOS).

url
A URL. All browsers render this as a garden-variety text input field. Mobile devices may change the keyboard to one that is more URL-friendly (e.g. it has a “.com” button). When submitted, the browser will do some very basic validation to make sure the field holds a valid URL. In most browsers, all that’s needed to be a “valid URL” is the scheme (e.g. “http:”).

HTML5 Form Attributes

Common Form Control Attributes
autofocus
Specifies that the form control should automatically get focus when the page loads. A focused element is the element that the user is currently interacting with, and is often highlighted in some way. So, by specifying the autofocus attribute, you’re telling the browser that this is the form field that the user should start interacting with when the page first loads. This is a Boolean attribute.
form (uncommon)
Associates a form control with its form. The value should be the ID of the form. Form controls are usually child elements of a <form> element. In that case, they are automatically associated with that element, and it’s not necessary to use this attribute. The W3C created this attribute as a way to get around the fact that multiple forms can’t be nested. Few people are interested in nesting forms, so this attribute is rarely used.
placeholder
Placeholder text is a hint to the user about what can be entered into the field. The text disappears when the user enters anything into the field. Unsurprisingly, this attribute is only recognized on text controls: the text, password, email, search, tel, and url input types, plus the <textarea> tag.
required
Specifies that the field is required. If the user does not enter information into the form field, then the browser will not allow the form to be submitted. This is done without using JavaScript, and happens even if JavaScript is turned off in the browser. This is a Boolean attribute.
<form> Attributes
autocomplete
Specifies whether autocomplete should be used on this form. If autocomplete is on, the browser fills in the form fields using values that the user has previously entered. Users can always overwrite these values, and they can tell the browser not to use autocomplete at all. Individual form controls may also override this behavior.

It is not a Boolean attribute; the values must be either on, off, or default (which is usually the same as on).

novalidate
This attribute specifies that the form should not be validated before being sent to the server. It is a Boolean attribute. Most forms are validated using JavaScript, so it is up to the JavaScript validation code to honor this attribute.
<input> Attributes
autocomplete
Specifies whether autocomplete should be used on this input. This overrides the autocomplete attribute on the <form> element, but the user can still choose not to use autocomplete.

Values must be on or off. If the form itself does not have the autocomplete attribute set, then the default is on.

The WHATWG standard also allows a space-separated list of autofill detail tokens. These are short strings that help tell the browser what values to use when filling in the form. They are not part of the W3C specification, and I don’t know how widely used these tokens are, or if they are recognized by any browser. So, I won’t cover them here; for more information, read the Autofill Detail Tokens section of the WHATWG living standard.

You normally use the value of off with this attribute. This could be used for fields that should never be re-used (like a one-time activation key), or for sensitive information that should never be cached (like the activation codes for a nuclear missile). It is probably not a good idea to use this on password fields, since users generally want passwords to be remembered. You could also use JavaScript to turn autocomplete on or off, e.g. by using a “remember me” checkbox.

If you want to supply your own list of options for autocomplete to use, then specify those options in a <datalist> element, and set the list attribute to the ID of that <datalist> tag.

list
The ID of a <datalist> element. That element should hold options to use when automatically completing the form input.
multiple
This attribute allows users to enter multiple values into the field. It is a Boolean attribute. It can only be used on input types that accept distinct values: image, email, file, and url. The values must be comma-separated.
pattern
A validation pattern. This is a regular expression (regex), that is matched against the field’s value when the form is submitted. This can be very useful if the browser’s default validation is not very good (e.g. the tel type), or if the browser supports this attribute but not an HTML5 input type. The regex should follow the JavaScript syntax, which is basically the Perl syntax. (JavaScript programmers: you do not include the slashes used to create a regex literal). If you are lazy, HTML5pattern.com/ is a good repository of validation patterns.
Form Submission Attributes
In certain cases, you will want each submit button to send the form information to a different resource on the server (i.e. different form processors). This was not possible in HTML 4.01; the form may have different submit buttons, but they would all go to the same resource. These HTML5 attributes change all of that.

Because they deal with form submission, these attributes can only be used on the image and submit input types, and on the <button> tag when its type is submit.

formaction
This attribute to specifies a URL to an alternate form processor. It overrides the action attribute of the <form> tag, and can be used exactly like that attribute.
formenctype (uncommon)
The encoding type for the form. It overrides the enctype attribute of the <form> tag, and can accept the same values. There are very few cases where this attribute is needed.
formmethod
The HTTP method used to submit the form. It must be either get or post.
formnovalidate
This attribute specifies that the form should not be validated before submission. It overrides the novalidate attribute of the <form> tag, and is also a Boolean attribute.
formtarget (uncommon)
The target for the form’s HTML reply. It overrides the target attribute of the <form> tag. Like that attribute, there is rarely any reason to specify the target.
height (discouraged)
The height of the image used in an image submit field. Unsurprisingly, it will not be recognized unless the input type is image. If omitted, it will use the hight of the image file itself. It is a bad idea to have the browser resize the image, for the reasons I covered when I talked about the <img> tag.
width (discouraged)
The width of the image. Again, it can only be used if the input type is image, and using it is not a good idea.
<textarea> Attributes
maxlength (HTML5 only)
The maximum length of the input field. This is the maximum number of characters that the field is allowed to have; it is typically more than the number of characters that the visible area can hold. It serves the same purpose as the maxlength field on a single line text input field, but was not allowed on this tag prior to HTML5.
wrap
Specifies how the the text should be wrapped when the form is submitted. It is an enumerated attribute, and can accept these values:

hard
Hard line wrapping means that the text should contain newlines when the column width is reached. (These are newlines in the text, not HTML <br /> or <p> tags.) If this attribute is specified, then the cols attribute is required.
soft
The text is not wrapped; newlines are not inserted, and the text is submitted as-is. This is the default.
Advertisements

About Karl

I live in the Boston area, and am currently studying for a BS in Computer Science at UMass Boston. I graduated with honors with an AS in Computer Science (Transfer Option) from BHCC, acquiring a certificate in OOP along the way. I also perform experimental electronic music as Karlheinz.
This entry was posted in HTML and tagged , . Bookmark the permalink.

4 Responses to A Guide to HTML

  1. Ben says:

    You probably know this, but the self-closing-ness of your tags are likely to be totally ignored by the browser unless you set the mime type to xhtml! For example, in most browsers this
    This is a paragraph
    will render exactly the same as this:
    This is a paragraph

    A common gotcha is thinking that you can use a self-closing script tag, e.g.

    instead of

    More on this on stackoverflow:
    http://stackoverflow.com/questions/69913/why-dont-self-closing-script-tags-work

    So there can be arguments for doing this for style or tool support, but the browser really doesn’t care.

    • Ben says:

      wow, looks like wordpress doesn’t escape tags. Sorry, but hopefully the stackoverflow article explains well enough

      • Karl says:

        Also, about the tags – nope, WordPress does not excape HTML; you can use it to mark up comments (as I did just now), so it really can’t do that. For HTML tags, you have to use the &lt; and &gt; escape sequences. I also had to do this when I wrote the article, so I know how much of a PITA it is.

    • Karl says:

      Ben: First of all, thanks for taking the time to read the article. I need all the help I can get…

      The “self-closingness” issue applies only to tags that do not represent empty elements, and that includes the <script> tag. It’s supposed to contain text data (the actual JavaScript code). Trying to make these tags self-closing is not valid XHTML, and browsers will consider it “tag soup.”

      If the tag is actually self-closing – like the <br /> or <input /> tags – then the XHTML standard demands that they be properly terminated, or they won’t validate. The HTML standard (even HTML5) does not, but since XHTML is the one that has been used for years, I think it’s better to include the terminating slash.

      However, the Stack Overflow post did show something else that I wasn’t aware of: the <p> tag cannot contain other block-level tags, like <div>. (Inline tags are fine.) If you try to do this, the browser will consider it “tag soup” and automatically treat it as a tag with empty content. In other words, <p><div>Hello, world!</div></p> will turn into <p></p><div>Hello, world!</div><p></p>.

      I’ll update the article with this info. So, thank you for pointing this out. Please let me know if you find anything else in the article that needs work.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s