Can anyone provide me with a detailed explanation on this so I can fully understand it?
HTML is based on SGML (Standard Generalized Markup Language) which is a very forgiving markup standard, you can do things like this in HTML:
<ul>
<li>
<li>
<li>
</ul>
or
<B><i><u>Some text</b></i></U>
or
<a href="somepage.htm"><img href=someimg.jpg></a>
or
<span><div>Some Text</div></span>
In short, it is very loose.
One day, the maintainer of the standard (W3C) said: "We have enough of this mess, people are getting more and more creative in writing their markups, let’s stricken up the standard."
So the XHTML standard is born, xhtml is based on XML, which is a stricter subset of SGML. In XHTML, the tags must be written up to a certain requirements, I’ll highlight some of the most important:
- all tags must be properly closed, and in case of tags like img, it must be self closed, i.e.
<ul>
<li></li>
<li></li>
<ul>
<!– proper care must be excercised, XML only require <img/> but some older browser have problems with that so give space before the ‘/’ in a self closing tag –>
<img />
- All attributes must be enclosed in quotes:
<a href="link.htm">Hello this is a valid Xhtml</a>
<a href=link.htm>Go away this is an invalid XHTML</a>
- All tags must be properly nested, tags must be closed in the order they’re opened, e.g.:
<b><i><u>Incorrect XTHML</b></i></u>
<b><i><u>Incorrect XTHML</u></i></b>
Proper nesting also means that a block level tags (p, div, blockquote, etc) can’t be nested inside inline tags (span, a, i, u, b, etc). In short, block can contain block and inline, inline can contain only other inline. It also means that certain tags can only be placed inside another tags, like <li> must be inside <ul> or <ol>
- All tags are case-sensitive, and all XHTML tags are in lower case (<a> is correct, <A> is incorrect)
- Attributes minimization is a no-no:
<tag attr></tag> is WRONG
write it as:
<tag attr="attr"></tag>
- and many others, I don’t remember all of them (as I write XHTML by heart, I’ve never written invalid XHTML, as far as I know)
Why is the need for a change? Mainly because of CSS. CSS requires a tree-like structure to work, or else it is undetermined what browser would do in case of a maligned nesting, they may break, they may give out wrong result, they may give odd result, they may do anything. Some people call a non-XHTML pages as "tag soup" to illustrate the mess.
It’s also because it’s (much) easier to create an XML parser rather than trying to guess what the author want to do. In XML, if you see something not right, you spew errors and point the author to correct it, in SGML, you’ve got to chew those irky tags and produce a result (whatever it is) because it’s in the standard.