Semantics in web development

Heads Up!

This article is several years old now, and much has happened since then, so please keep that in mind while reading it.

Web Developers are in the business of communication, conveying meaning from one entity to another. The method we use is language, human language for people and programming language for computers.

Semantics is the study of 'meaning' conveyed using these languages.

Human beings strive to find meaning in everything, so the field of Semantics is huge!

Therefore, it comes as no surprise that the term “Semantics” can mean many different things. In this article, I will identify and explain some of the most common examples in our industry.

The key elements of semantics

We often hear the phrases; something is ‘semantically incorrect’ or a ‘question of semantics’. What do we ‘mean’ by these phrases in the context of web development?

Let’s look at three key elements that define semantics in web development:

  • Intent
  • Content
  • Context

Intent

When we create content, that we wish to communicate, there are two intentions; our intended audience (digital or human) and our intended meaning, sometimes our intentions fail.

We create content to convey our intended meaning to our intended recipient but sometimes the road to failure is paved with good intentions

Intended audience

Intended audiences are a form of context.

We have all accidentally sent an email to the wrong person, commonly when they have the same first name and we do not notice that an unintended recipient has been added by auto complete.

Right content, wrong context.

Intended meaning

Often we write something that we believe is very clear, when the intended recipient sees or hears something completely different. In code we get an error message, in real life we may get undesirable unintended consequences.

Content

Communication requires content whether it is visual, the written word or computer code. In web development, we often combine consumable content intended for the human recipient with instructional content in the form of code for the digital recipient.

Labelling content is also very important e.g. gardeners will often stick labels in plant pots to remind them what they contain.

Always label seed packets immediately. Store in a cool, dry place until ready for sowing.
- Monty Don

Let’s take an example of a simple web page about Lavender in a garden centre website with a semantically named domain name:

Before we even begin to discuss Sematic Code, Sematic Markup or the Semantic Web, here is our first example of code that is meaningful to both humans who can read it and computers using DNS who can direct users to the correct web application.

In the same way we can use file naming to describe content, our lavender page may be poorly named:

  • page-07.html

If hyperlinks resolve to this page technically it works for computers, but this does not carry any meaning to humans other than it may be page number 7 in a series of web pages.

It’s arguable in the SEO community whether popular search engine algorithms factor in file naming, but this would also be an example using semantics to convey meaning to computers for the purpose of search.

To give this page meaning to humans, a better name would be:

  • lavender.html

The same applies to an image file. If you download some images from a website and find them a few weeks later, without previewing the images, would you really know what is in this image:

  • g5ds5sdf4sdf.jpg

Of course it would be better if the images were named:

  • lavender.jpg
  • rose.jpg
  • honeysuckle.jpg

So you get the idea how important naming content is. In a way, meaningful file naming is in fact a form of metadata (or data about data) i.e. the filename is telling us what the content is within that file, whether it's an image or a web page.

So we now have a web page clearly about “Lavender” residing at the following web address:

This make perfect sense in the 'context' of a garden centre web site, which brings us neatly onto context.

Context

Sometimes simply naming is not enough. If your intended audience does not recognise the naming, then context my help.

Content is king, context is queen and together they rule.

Imagine we have a list of plants using their Latin names and they are grouped by whether they like sun or shade.

We can create meaningful URLs with just HTML alone and clever use of folder structure and file naming.

Simply by nesting HTML files with a default filename such as index.html within folders, we can create URLs such as:

Now when we place the relevant files within the correct folders we generate meaningful URL’s that describe structure such as:

Of course, most modern backend languages can do this with routing now. In fact we could make lavender.html appear in:

  • sunny/lavender
  • purple/lavender
  • scented/lavender

And use the meaningful <link rel="canonical" href=”where-lavender-lives”> to describe the default address for Lavender.

Regardless of the method used this is all part of semantics in modern web development we have given 'content' some 'context'.

Semantic Code

In our spoken language we encode meaning. The listener needs to know that language in order to decipher (or decode) the 'meaning', to understand that meaning between two languages we need a translator.

We're flooding people with information. We need to feed it through a processor. A human must turn information into intelligence or knowledge. We've tended to forget that no computer will ever ask a new question.
- Grace Hopper

When we give instructions to a computer in code, it must also be delivered in a programming language the computer understands. Yet we can write the instructions in a completely different programming language and translate it using a transpiler (or transcompiler).

In the same way humans can describe the same thing in multiple languages, we can do the same thing with computers in multiple programming languages. In fact, terms within programming languages can often be used in different ways depending on the 'context' they are used.

Human Readable Code

Computers evaluate code using syntax and context, and whilst some programming languages may be syntactically correct this means the code only has meaning to computers.

In web development the content that we create must often have 'meaning' to both humans and machines i.e. Human Readable Code.

In modern programming languages, syntax is often easy to learn and it is meaningful to humans, in fact writing good code often includes making it possible to pass on to another developer.

A good example of making code readable to human beings, would be the BEM system developed by Yandex.

Note: Semantic class names do not have any impact on accessibility i.e. BEM syntax is sematic to developers, but not assistive technology.

Developers at Yandex found that their CSS was getting so complicated when they were sharing it and so proposed a naming convention to make things easier.

This system became known as BEM or Block, Element, Modifier using the following syntax:

  • block-name_modifier-name
  • block-name__element-name_modifier-name

Here is a basic example from Yandex showing BEM in use:

<!-- `search-form` block -->
<form class="search-form">
    <!-- `input` element in the `search-form` block -->
    <input class="search-form__input">

    <!-- `button` element in the `search-form` block -->
    <button class="search-form__button">Search</button>
</form>

BEM Methodology has now found its way into numerous other programming languages including CSS and JavaScript and even file naming and project structure.

You can find out more about BEM here:

https://en.bem.info/methodology/

We can of course just use good commenting, but better still we can write self documenting code as well.

Semantic Markup

Semantic markup is a form of semantic code and the most common use in web development is HTML.

We've all seen examples of things that look like buttons which actually are 5 or 6 nested <div> elements with a non breaking space in the middle. Then to make that work properly in non visual browsers you have to put a tab index on it to make sure that it can be tabbed into, because only links and form elements by default can be tabbed into, in browsers. Then you have to stick a bit of JavaScript on the top to handle clicks. Then you probably have to use some CSS and on hover or on focus change the cursor pointer to look like a little hand pointing. Then you need an ARIA role to say that this nested <div> pile is actually, fulfills the role of a button. Or you could just use the button element, which is what it's there for, and all this stuff is given to you. It seems to me a no-brainer.
- Bruce Lawson (HTML Semantics with Bruce Lawson | The Web Ahead)

The earliest versions of HTML included instructions to the user agent, usually a web browser, about the structure of the document as well as formatting instructions such as colours and fonts.

As the demands on web applications became more complex, it soon became clear a separation of concerns was required and the formatting aspect of HTML became Cascading Style Sheets (CSS).

So what do we mean by Semantic Markup in HTML?

HTML – Non-Semantic

Let's address the elephant in the room right away, some HTML is not semantic and unfortunately it’s the type that gets used most often and incorrectly . Yes I'm talking about you <div> and <span>.

Everything's a <div>,
Everything's a <div>,
Hey, did you know this <button> is a <div>?
- cf. 'Everything's A Drum'

Often, entire pages of HTML have every single element marked up as a <div>; just because you can, it does not mean you should.

The <div> and <span> were not in the original HTML draught but appeared in the specifications around 1997.

Ironically, in the context of this article, whilst the <span> element has no semantics attached to it, the original intention was to address the issue of adding language attributes in the middle of paragraphs and hence is an inline element:

  • <p>I do not speak <span lang="ja">日本語</span> very well</p>

The <div> element (short for division) on the other hand is a block element and was introduced to provide a generic container that had no meaning or layout style attached (other than they stack one above the other as any other block element).

The <div> element should be used only when no other semantic element (such as <article> or <nav>) is appropriate.
MDN

The <div> element is also undoubtedly one of the most widely misused elements. As the quote above suggests, <div> should only be used where there is no meaning attached to the container, or there is not an existing semantic HTML element available.

The Codepen below is a Tailwind component example I found that was pretty much all <div>'s and <span>'s. I have edited it slightly here, but not much, to demonstrate how not to mark up HTML.

It may be best to open it in Codepen to see both the HTML and the preview.

See the Pen by deanleigh (@deanleigh) on CodePen.

Whilst the CSS works perfectly well the HTML is completely meaningless.

Now let's take a look at how to mark up HTML up semantically. 

HTML - Structural

Structural HTML is very much semantic, it not only gives meaning to content it gives it context through the use of hierarchical structure.

In the most basic of HTML, we can see that when we use <h1>, <h2> or <p> the browser format's the text in a particular way e.g. headings start larger than paragraph text and gradually get smaller.

But as mentioned, HTML does not really concern itself with visual formatting, what you can see is the browser's interpretation off the markup.

What is really happening is we are giving meaning to the content through the use of structure.

Let's take a look at a web page and use one of our plants, lavender as an example:

<h1>Lavender</h1>
<h2>Introduction</h2>
<p>Lavender (Lavandula) is a wonderful plant for both appearance and fragrance.</p>
<p>It belongs to a genus of 47 known species of flowering plants in the mint family.</p>
<h2>Types of Lavender</h2>
<p>There are three major types of Lavender and one hybrid (Lavandins)</p>
<h3>English Lavender</h3>
<p>These bloom from spring to early summer.</p>
<h3>French Lavender</h3>
<p>Have grey, serrated leaves.</p>
<h3>Spanish Lavender</h3>
<p>Known for their unusual two-toned, pineapple-shaped blooms.</p>
<h2>Species and Sub-Species of Lavendar</h2>
<p>Lavender has many&nbsp;Species and Sub-Species, for example the sub-genus Lavendula has three species:<em><br /></em>
</p>
<dl>
    <dt>Lavandula angustifolia Mill.</dt>
    <dd>subsp. angustifolia from Catalonia and the Pyrenees.</dd>
    <dt>subsp. pyrenaica from southeast France and adjacent areas of Italy.</dt>
    <dd>Lavandula latifolia Medik</dd>
    <dt>native to central Portugal, central and eastern Spain, southern France, northern Italy.</dt>
    <dd>Lavandula lanata Boiss.</dd>
    <dd>native to southern Spain.</dd>
</dl>
<h2>Uses of Lavender</h2>
<p>Lavender has many uses including:</p>
<ul>
    <li>Essential Oil</li>
    <li>Dried for fragrant pillows</li>
    <li>Culinary use</li>
</ul>

Here, with the use of some very basic HTML we have given the page structure. We can clearly see that this is a page about lavender as we have used <h1> for the title of the page.

For "Introduction", "Types of Lavender" and "Species and subspecies" we have used <h2> to denote blocks of related content.

This nested heading formula is really at the core of HTML and not only give structure and meaning to web browsers but any other software that can read HTML. This includes software such as Microsoft Word, Google Docs or Adobe PDF,  they can all recognise the way the content is structured within this page.

Within the "Introduction" we have introduced paragraphs using the <p> element.

In the "Types of Lavender" section We have created subsections using the <h3> element.

In the Species and Sub-Species of Lavender section we have used a Description List <dl> which itself has Description Titles <dt> and Description Definitions <dd> as we can see in the last item, “Native to central Portugal…” we have multiple definitions.

And lastly for the uses of lavender we use a simple Unordered Bulleted List <ul> which could of course be a Ordered List (or numbered list) <ol>.

Essentially we have all of the typography formatting elements that you would expect from any word processor. However, all of these have meaning not just to users visually reading but software that can read this aloud and can even recognise lists.

We can already ask voice activated devices to give us a list of recipe ingredients or to give us the definition of a word, in which case it would very likely read it from a web page using the same markup that we have above.

This simple markup is both human and machine readable , HTML is very good at conveying meaning.

HTML - Sectional

Sectional HTML is a relative newcomer and has added a great deal of meaning to what was commonly being used i.e. lot’s of nested <div>’s.

The HTML5 specification authors researched the naming of these new elements, by what was most commonly being used as CSS class names e.g. class="main".

Anyone familiar with HTML, will recognise the <body> element within which we are able to add a <header> and <footer> elements above and below the <main> element.

The <header> is not to be confused with the <head> element which sits outside and above the <body> element.

The most basic of the sectional elements is of course <section>, a simple way to give user agents and assistive technologies an understanding of structure within large documents by grouping related content together.

Often within a single web page we may wish to define groups that have content which can be self-contained and in this case, the <article> element is a good fit.

Both <article> and <section> can be contained within the <main> area of a web page and if there is additional information related to this content, we can use the <aside> element. 

<header>
    <nav></nav>
</header>
<main>
    <section>
        <h2></h2>
        <article>
            <h3></h3>
            <p></p>
        </article>
        <article>
            <h3></h3>
            <p></p>
        </article>
        <article>
            <h3></h3>
            <p></p>
        </article>
    </section>
    <aside>
        <section>
            <h2></h2>
            <article>
                <h3></h3>
                <ul>
                    <li></li>
                    <li></li>
                    <li></li>
                </ul>
            </article>
        </section>
    </aside>
</main>
<footer>
    <nav></nav>
</footer>

Putting it all together

Remember the component we looked at earlier, now we know how it should be marked up let's revisit it.

See the Pen by deanleigh (@deanleigh) on CodePen.

Better? Don't forget to click HTML or better still open it in Codepen.

We can now clearly see the structure of the document in the code, this will be the same for browsers, search engines and assistive technology.

Yet, it looks exactly the same in browser. The 'separation of concerns' between meaning and appearance is achieved with good old HTML and CSS.

But of course there is a catch, with all this fancy CSS giving us control over layout it does not that mean that content may appear in a different order to that which it was written.

This is a good time to talk about Source order.

Source order

Source order is the order in which elements appear within the document object model or DOM.

This is important as the order in which the content appears will be delivered to users or various different technologies in that order.

There is much debate around the visual order of content and the source order of contents for various users.

However, research and collaboration with users of assistive technologies, have given us plenty of feedback to understand the problems they face when source order is not taken into consideration.

It is in fact not just an accessibility issue but can affect things like search engine optimization and how web pages are processed in other types of software.

What is clear though, is that source order is very important at, yes you have guessed correctly, conveying 'meaning' to both humans and computers.

Metadata

Metadata (or data about data) takes many forms and has many uses in web development. Most commonly when we think about Metadata it's in the context of search engine results and how they display information about web pages.

The humble <title> element is still used to display the name of your web page in the tab at the top of your web browser.

One of the key uses of Metadata in web development, is describing the following intended information: 

<!-- Meta data -->
<title>The Umbraco Garden Center - Lavender</title>
<meta name="description" content="Lavandula (common name lavender) is a genus of 47 known species of flowering plants in the mint family">
<link rel="canonical" href="http://www.umbraco-gardencenter.co.uk/images/lavender.jpg">
<!-- Twitter Card data -->
<meta name="twitter:card" content="summary">
<meta name="twitter:title" content="The Umbraco Garden Center - Lavender">
<meta name="twitter:description" content="Lavandula (common name lavender) is a genus of 47 known species of flowering plants in the mint family">
<meta name="twitter:image" content="http://www.umbraco-gardencenter.co.uk/images/lavender.jpg">
<!-- Open Graph data -->
<meta property="og:type" content="article">
<meta property="og:title" content="The Umbraco Garden Center - Lavender">
<meta name="og:image" content="http://www.umbraco-gardencenter.co.uk/images/lavender.jpg">
<meta property="og:description" content="Lavandula (common name lavender) is a genus of 47 known species of flowering plants in the mint family">

WAI-ARIA

WAI-ARIA, the Accessible Rich Internet Applications Suite, is a way of enhancing HTML to improve Accessibility where needed. HTML, if written correctly, should be accessible on its own, ARIA should only be used where this is not the case, as Heyden Pickering so elegantly states here:

ARIA doesn't make HTML accessible, it makes inaccessible HTML accessible
- Heyon Pickering

This statement could not be a better starting point for the use of ARIA, however when we do need its help it is more than just the case of scattering a few attributes here and there.

ARIA assists users by applying meaning in three key ways; roles, states and properties:

Roles

A few examples of ARIA roles:

  • Widget roles
    • button
    • checkbox
  • Composite roles
    • menu
    • menubar
  • Document structure roles
    • list
    • listitem
  • Landmark roles
  • main
  • navigation

States and Properties

A few examples of ARIA states and properties:

  • Widget attributes
    • aria-current
    • aria-disabled
  • Relationship attributes
    • aria-colcount
    • aria-colindex
    • aria-colspan

This is really just scratching the surface and the relationship between these examples is also very important and is a whole other topic.

If you wish to find out more about ARIA a good place to start is here:

https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/ARIA_Techniques

Semantic web

Tim Berners-Lee had a vision that HTML could also contain data that had meaning to machines.

I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The "intelligent agents" people have touted for ages will finally materialize.
- Tim Berners-Lee

When we discuss the semantic web, most people who are familiar with it would undoubtedly think of marking up HTML with the form of data.

In fact in the early 2000’s we were getting very excited about harnessing the power of data stored in XML. Many of us are familiar with extracting data from XML and presenting it to the screen by translating it with XSLT to XHTML.

However, at the time competing technologies such as Resource Description Framework (RDF), were much more flexible as they supported multiple schemas.

Microdata, RDFa and JSON-LD

Of these various schemas the ones that gained popularity were the ones that worked well with search engines. Despite the obvious benefits to marking up web pages with meaningful data, this technology did not really take off at any great speed.

<!-- START ORGANISATION -->
<p itemscope="" itemtype="http://schema.org/Organization">
    <span itemprop="name">The Umbraco Garden Center</span><br>
    <!-- START POSTAL ADDRESS -->
<span itemscope itemtype="https://schema.org/PostalAddress">
    <span itemprop="streetAddress">37, Queen Street</span><br>
    <span itemprop="addressRegion">Colchester</span><br>
    <span itemprop="addressLocality">Essex</span><br>
    <span itemprop="postalCode">CO1 2PQ</span><br>
    <span itemprop="addressCountry">United Kingdon</span>
</span>
<!-- END POSTAL ADDRESS -->
</p>
<!-- END ORGANISATION -->

It was really only once search engines, of course predominantly Google, started to make use of this additional data that it became of any value and popularity.

Yet oddly only a few industries took to this you technology seriously e.g. event ticketing companies found that if they marked up all of their events with either Microdata or RDF the events would be listed at the top of the search engines with all of the relevant dates, times and locations.

But this seemed to be about as far as it went until Google coined the phrase “Rich Snippets” and popularised the existing terminology “Structured data”.

They now boast a long list of supported schema items, here are a few:

  • Article
  • Book
  • Breadcrumb
  • Carousel
  • COVID-19 announcements
  • Event
  • Fact Check
  • FAQ
  • How-to
  • Job Posting
  • Local Business
  • Q&A
  • Recipe
  • Review snippet

You are probably already familiar with recipes appearing in the Search Engine Results Pages without even having to visit the website containing the recipe.

These will be coming to smart devices near you soon, just keep an on the front of your fridge!

Summary

Every bit of code we write, every bit of content we create, every file we name can and should have meaning, meaning to computers, meaning to human beings.

“Our prime purpose in this life is to help others. And if you can’t help them, at least don’t hurt them.”
- The Dalai Lama

We build and create websites to help people and to help each other, if we make our intentions clear, if we create good content in good context then we will achieve our aims.

 

 

Dean Leigh

Dean is on Twitter as