What Exactly is HTML?

1. What is HTML?

  • First of all, HTML stands for "HyperText Markup Language."
  • But what does that really mean? Well, for one thing it means that HTML is not a computer language in the same sense that C, Perl, Fortran or Java is a computer language. HTML is a set of "Tags" that are places around text or images in a page, to make the text or image do specific things. It's more like a publishing language, or a set of rules for page layout.
  • Another thing to note about HTML is that even though it has developed into quite a rich and full-featured markup language (at least compared to what it was in 1995 when I started learning it) you still don't really need to know very many "Tags" at all to create useful and pretty websites.
  • Finally, and this is extremely important, an HTML file must be a plain text file only. This means that you can generate an HTML file from virtually any text editor, including Microsoft Word. But it also means that if you save it as a word document, instead of Plain Text Format, it's not going to work. This allows HTML to be portable and readable by any modern web browser on any computer. (Note the use of the word modern, very old browsers won't be able to understand a lot of the newer webpages).

2. Okay, so what is a "Tag"?

  • A Tag simply refers to the actual HTML code that you write. You "Markup" the text with "Tags" to make it appear how you want.
  • In operational terms, a Tag is anything that appears between < and >. For example, to define a piece of text in a paragraph, you would enclose it within the paragraph tag
    • <p>Text goes here!</p>.
  • As you can see, the paragraph tag actually consists of two tags, one to start the markup section, and one to end it. As a general rule, most tags require both a starting and ending tag to function properly (there are of course exceptions, and we will cover some of those). The ending tag is typically the same as the starting, but preceeded by a backslash.
  • Another thing to know about tags, is that many of them have additional, optional attributes that will allow you to specify things about style or formatting. For example, if you wanted your paragraph of text centered on the screen, you would use the "align" attribute for your paragraph tag. Note that these attributes are only present in the starting tag, they are not mirrored in the ending tag of the markup.
    • <p align=center>Text goes here!</p>.

3. How is the actual HTML code structured on a page

  • Well the easiest way to see this (and actually one of the best ways to learn HTML), is just to use the "View Page Source" function that is implemented into most modern browsers. This shows you the raw HTML code from the page.
  • First of all, the entire HTML page is enclosed in <html></html>. This is probably not necessary, but it is proper.
  • Within that, there are two main sections. <head></head> and <body></body>
  • <head> is usually reserved for indexing information about the page, or things that are not meant to be seen by the casual web browser. It also includes the title of your page which goes in the top of the window, and things like javascript code.
  • <body> contains the main portion of the HTML, and everything you want the surfer to see.
    • Use <body bgcolor=FFFFFF> for a white background. FFFFFF is the hex code for white, and can be changed to yield any background colour that you want.
    • Use <body background=coolpicture.jpg> to specify a picture as a background for your page. coolpicture.jpg should specify the path name of the actual image file on the server, relative to that HTML file (though be careful, it's hard to read over a lot of background pictures).
  • One more thing, all HTML files should end with the extension ".html" PC users might have to use ".htm", and that's okay too.

4. How about some more tag examples?

  • Paragraph
    • <p>Text goes here!</p>
    • Attributes: align=center|left|right
  • Heading
    • There are six levels of heading, with 1 being the most important and 6 being the least. Use this as section headings, it will be larger and bolder than a standard paragraph.
    • <h1>Text</h1>, <h2>Text</h2> ... <h6>Text</h6>
    • Attributes: align=center|left|right
  • Lists
    • There are two kinds of lists:
      1. Ordered Lists <ol></ol>
      2. Unordered Lists <ul></ul>
      3. The difference is that an ordered list uses numbers, but an unordered list uses bullets
    • The list is the first tag that now requires another tag nested within, and that is the List Element "<li>"
    • Use <li> to start off each element in the list. This is a tag that typically does not require a closing tag, as closing is implied by either a new <li> element, or a list closing element </ol> or </ul>
    • It is also possible to nest one list within another. If you nest an unordered list within an ordered one, you would have a numbered heading, followed by a series of bullets. Nested lists have increased indentation, and often a different bullet style (browser dependant)
  • Links
    • The link is one of the more important tags to know, but also one of the stranger. Unlike the paragraph tag, the link tag REQUIRES an attribute to work properly. The attribute, in this case, is the web address that you want to link to.
    • <a></a>
    • Attribute: href (stands for Hypertext REFerence)
    • So a typical link looks like this: <a href="http://www.purdue.edu/">Purdue University</a>
    • There are two kinds of links; absolute and relative. Absolute links are like the one above, and include the full URL (http etc). Relative links specify only the path of the new page relative to the current page. Thus, if you wanted to link to a page in the same directory, you would use <a href=otherpage.html>Nice Link!</a>. This makes for a lot less typing for you, and also makes it easy to transfer the entire site to another server without changing any of the code.
    • Finally, you can also use a link to send an email to someone via your brower. Use this format <a href="mailto:joe@domain.com"> where "joe@domain.com" is clearly the email address of the intended recepient.
  • Images
    • What webpage would be complete without some pretty pictures? (besides this one of course:-)
    • The image tag is strange because it both requires an attribute to function (you specify the image file), and it has no ending tag. It has no ending tag because it is not actually used to markup text, so it does not surround anything.
    • <img>
    • Attributes: src=prettypic.jpg this specifies the location of the image on the server. Note that only gif, jpg and png files can be viewed by most browsers. Things like tiff will sometimes work, but they are typically so large they should be avoided anyways.
    • alt="This is a pretty picture" this specifies text that will be displayed instead of the image, in case the image fails to load, or in case someone chooses not to load the image (often the case when surfing on a really slow connection). This information is also used when visualy impaired people are surfing the web, so it's nice to include. Notice that it needs to be enclosed in quotes, since the alternate text is usually a string with spaces.
    • border=0 by default, if you place a link around an image, it will have a blue border. Use this to eliminate that if you want
    • A full image tag might look like this: <img src=picture.jpg alt="Nice Picture!">
  • Horizontal Rule
    • <hr> Another tag without any closing portion, draws a nice horizontal line across the page.
  • Other

5. General HTML Rules

  • Typically all of your text should be imbedded in some kind of tag, either a paragraph, header or list item.
  • Images can also be imbedded in a paragraph, header or list (though they tend to make lists look funny).
  • Links can be imbedded as part of a paragraph or header, but not vice versa. There is a bit of a hierarchy to tags, which you will get a feeling for over time.
  • Paragraphs cannot be inside headers, and vice versa. For this kind of thing, common sense usually prevails. Paragraphs and headers are both different categories of text, so a given piece of text can only be one or the other, but not both.

6. Some Extra Style Tags

  • <b></b> Make your text Bold
  • <b></b> Make your text Italic
  • <strong></strong> Give your text Strong Emphasis (usually the same as bold, but browser dependant)
  • <tt></tt> Display text in a monospaced font (like I do a lot here), for things like code display
  • <code></code> Similar to the monospaced tool (like bold vs. strong) but specifically for source code
  • <br> Have you noticed yet that a carriage return is ignored by the web browser? Use this to force a line break to be displayed
  • &nbsp; Have you also noticed that when you put four spaces (or more) in a row, a web browser will pretend it's only one? Use a non-breaking space to tell the browsers that you want them to display each and every space (useful for indenting things).

7. Some tags for the <head></head> Section

  • <title></title> Put the title for your page here, this is what will be displayed as the browser window title

8. Miscellaneous Tips

  • Remember: Only GIF's and JPG's and PNG's! BMP, TIFF, and other formats do not work with the web. It's also a good idea to think about how big your pictures are, both in terms of resolution and filesize. Remember other people might have a smaller screen, and might have a slower connection than you do.
  • Unix is case sensitive, and most of the internet is powered by unix machines. That means that picture.jpg and picture.JPG are NOT the same thing. This is a big problem when you name something with capital letters, but then link to it with small letters.
  • Make your file names simple, use only alphanumeric characters (letters and numbers). Don't include spaces, quotes, brackets, question marks, or the little prime indicators. These all cause problems on a Unix system for reasons I won't get into here. Keep it simple