Structured Data Markup magic and Microdata myths

posted on Dec 16, 2024 by Dean Leigh

Tagged with:: Content; SEO

There's a lot of advice around adding Schema Markup to websites that overwhelmingly favours JSON-LD over Microdata but is this true when working with Umbraco?

A boat moored at the end of a gangway with the tide out. The boat is named Freebird.

A small amount of markup can convey a lot of information

Introduction

In "24 Days In Umbraco 2020" I wrote about Semantics in web development.

At the end of the article I discuss how the aspiration for a 'Semantic Web' was being manifested by adding 'context' to our 'content' through the addition of Structured Data Markup.

Using Structured Data Markup is not only still relevant today, but its ease of implementation has been improved by both Schema.org and search engines such as Google with features like Rich Results (previously known as Rich Snippets) in the form of 'Features' and Enriched search results.

A Search Pilot poll on X/Twitter and LinkedIn showed that 96% of voters still believe JSON–LD is the better option than Microdata regarding markup.

So, we know that adding structured data can improve the findability of our web pages. However, when researching how to implement our markup, we find there is an overwhelming bias towards using JSON-LD over Microdata.

Before reading this article

If you are not already familiar with Schema Markup there are already some great articles on umbraco.com that are worth reading first…

What do these articles cover?

Umbraco has covered adding Schema Markup for a while:

What is Schema Markup?
A comprehensive primer for anyone new to Schema Markup.
How to implement Schema markup in Umbraco
Three different ways to add Schema Markup in Umbraco.
Schema markup that we use on this website
Examples of actual JSON-LD used in the Umbraco Website.

What does this article cover?

These articles seem to cover a great deal, so what is the purpose of this article?

To clarify a few things that seem to have stuck as "the only way" to add Schema Markup
To show some techniques we can use
And to possibly make things a little easier when using Umbraco

Below are some examples of statements that I have seen and heard that require further examination.

Google recommends using JSON-LD

This is true, they did and still do.

When Google committed to using Structured Data to enhance search results with, what at the time was called, Rich Snippets there was a table in Google Search Central showing JSON-LD, Microdata and RDFa.

Google had added "JSON-LD (recommended)" with no explanation.

This was validated further in question and answer sessions with Google employees as discussed here:
Google On Which Structured Data it Prefers: JSON-LD or Microdata?

In a Webmaster Hangout, Google’s John Mueller was asked what kind of structured data Google prefers. John Mueller answered that Google prefers JSON-LD structured data.

--Roger Montti--

Current recommendations

Google Search Central has since clarified their recommendations in Supported formats:

Structured data guidelines from Google

This is certainly an improvement and helps us understand 'why' Google recommends JSON-LD.

So let's look at some of these points in a bit more detail.

JSON-LD is more likely to create Rich Snippets?

Firstly let's address something that has been implied in the past by some SEO bloggers that because "Google recommends using JSON-LD" it was more likely to produce Rich Snippets.

The obsession with getting Rich Snippets certainly spawned hundreds of articles on SEO blogs on why and how to add the 'magic' JSON-LD to your website.

But a word from Google themselves:

Important: Google does not guarantee that your structured data will show up in search results, even if your page is marked up correctly according to the Rich Results Test.

But if you are favouring JSON-LD over Microdata purely on the basis it is "more likely to produce better SEO and Rich Snippets" then you may not be choosing the best option for your solution.

JSON-LD is better for SEO

To be fair through extensive testing this has been debunked by the SEO industry, a very good example is:

JSON–LD vs Microdata Revisited

Since the original test we wrote about in 2020, our customers have run hundreds of Microdata and JSON-LD Schema tests. In that time we have never detected a measurable SEO traffic impact when changing from one format over the other, regardless of schema type, website size, or industry.

Whilst we have seen this is clearly not true, using JSON-LD may even have a negative impact to a site's SEO by making it slower to load.

Let's take an example of an Index Page that may contain hundreds of cards, each with information such as Title, Description, Publish Date, Author(s), Topic Tags, etc…

A page like this may take two or three seconds to load on an average connection, By duplicating the entire content again in JSON LD, you may be adding another second or two to the page load time.

With repetitive data like this, it is much more performant to include the semantic markup in the form of microdata.

Because we know what the data is going to be, the editor really does not need to be concerned about marking the page up as this will all be taken care of automatically.

JSON-LD is future-proof

As I have mentioned previously, proposals to add semantic markup or metadata to HTML have been around for a while.

RDFa (Resource Description Framework in Attributes) was proposed to W3C's Semantic Web Interest Group "SWIG" (previously the RDF Interest Group) in 2004. It was released as part of XHTML in 2007-8 and many of us began using it immediately.

JSON-LD began life around 2010 and shortly afterwards Schema.org was launched in 2011, by major search engines including Google, Bing, and Yahoo!

Microdata, on the other hand, was recommended by W3C as part of the HTML5 specification in 2014.

So one could argue that Microdata is really the new kid on the block.

The impression that JSON-LD is somehow the latest is possibly down to Google not really advocating JSON-LD for structured data markup until December 2015.

As for being the future, AI powered search from the big players such as Google and Microsoft has been a game changer.

But can Large Language Models extract Structured Data from content without markup?

AI first companies such as Weaviate write about Extracting Structured Data in the midst of a fascinating article on Large Language Models and Search.

So perhaps all the current forms of structured data markup may no longer be required.

JSON-LD is the Easiest to implement

So there is certainly some truth to this statement from Google and others as there are many no-code/low-code ways to add JSON-LD.

The key difference being adding Microdata requires editing HTML which is often not easily accessible using many common CMSs, whereas JSON-LD can be add via Google Tag Manager to almost any website that has it installed.

I would argue however that adding Microdata to HTML where you can is fairly intuitive and again there are many tools that will work out the markup HTML for you in both JSON-LD and Microdata.

Third party Tools

There are many third party tools where the user completes a form and it will generate markup that can be added to a web page.

An example is Google's own Structured Data Markup Helper helping non-technical users to generate valid markup, in both JSON-LD and Microdata.

However, many of these tools, whether they are online or CMS plugins often only give you the most basic elements of each Schema type.

A screen showing a wide variety of standard. schema markup type such as articles, events, movies, products.

Google's Structured Data Markup Helper

So yes, using schema markup tools and templates can make life easier but many are limited and Google will not create features if any single piece of required data is missing.

In Umbraco we can create the same tools to create HTML sections such as Page Title, Page Description, Author(s), and Topics so why not add Microdata simultaneously?

My conclusion here is use what works for you, but in Umbraco we can make the editor's life easier and keep our content and code tightly coupled.

In-site Tools

As mentioned there are CMS plugins but they are quite opinionated and basic.

But there are also "In-code tools".

On a recent project I inherited a Profile Page template that had a fairly short Profile section but a large collection of related articles with the person as the Author.

I did not want to duplicate the long list of articles so they were marked up as Microdata.

However, the Profile part of the page was short and ideal for JSON-LD and there is no issue mixing both formats in a single page.

One of the issues with JSON-LD is it doesn't really play nicely with C#, so developers often use helper methods or libraries to handle JSON-LD e.g. serialising the JSON object directly into the Razor view.

So I choose to test Schema.NET which describes itself as "Schema.org objects turned into strongly typed C# POCO classes for use in .NET" which sounded ideal.

However, I was following this superb Person Schema example from Daniel K Cheung and I wanted to use @graph to create a relationship between:
@type": "ProfilePage"
and
@type": "Person"
but this would not compile using Schema.NET.

I got in touch with super talented author, Muhammad Rehan Saeed, who very kindly explained @graph was not available in Schema.NET and that he was "Not actively developing it further at the moment". Looking at Muhammad's full time role and other projects this was hardly surprising!

So I decided to add the JSON-LD to the template and as all of the variables were already available to render the view it was not a great deal of work to add them to the JSON-LD using the @ escape where required. Where I had not created a variable you will notice you can also include @Model.Content.Bio.


<script type="application/ld+json">
{
  "@@context": "https://schema.org",
  "@@graph": [
    {
      "@@type": "ProfilePage",
      "headline": "@pageTitle",
      "url": "@fullPageUrl",
      "@@id": "@fullPageUrl#profilepage",
      "inLanguage": "en-US",
      "mainEntity": {
        "@@id": "@fullPageUrl#person"
      },
      "isPartOf": {
        "@@type": "WebSite",
        "name": "Ocean Inc",
        "url": "https://ocean.com/",
        "@@id": "https://ocean.com/#website",
        "publisher": {
          "@@type": "Organization",
          "name": "Ocean Inc.",
          "url": "https://ocean.com/",
          "@@id": "https://ocean.com/"
        }
      }
    },
    {
      "@@type": "Person",
      "name": "@pageTitle",
      "url": "@fullPageUrl",
      "@@id": "@fullPageUrl#person",
      "description": "@Model.Content.Bio"
    }
  ]
}
</script>

This worked very well and created a perfectly valid Profile Page about a Person as part of a Website published by an Organisation.

Test results in Schema.org Validator

And equally "Valid items are eligible for Google Search's rich results"

Test results for a Profile page in Google's Rich Results Test

Conditional markup

In the same project I was marking up lists of articles with authors.

Each article had one or multiple authors and each of those authors had a link to their profile page.

Except not all of the authors were people; some were organisations.

This kind of conditional logic is simply not possible using variables inserted into the JSON-LD and conditional C# is not compatible either.

The logic has to be performed outside of the JSON-LD and this is why some of the tools above can be very useful.

However, it is very easy to do the same task in Razor with Microdata:

var authorItemType = article.Authors != null && article.Authors.Any() ? "https://schema.org/Person" : "https://schema.org/Organization";

and implemented in the foreach loop

of course along with other conditional code that matches what is visible in the page.

In many cases it is most certainly easier to implement Microdata.

Note: I am still looking at .Net Libraries to help with this process and I am yet to try:

json-ld.net

JSON-LD is less prone to user errors

I can see how ease of implementation may give the impression that using JSON-LD is less prone to user error. But it is just a format after all and requires just as much thought as any other format.

I recently came across an example in an Umbraco website, that on the surface, seemed to have made adding a JSON-LD address as easy as possible.

It consisted of an address form where the user was able to click a plus button to add new lines of an address.

However, the markup it generated was semantically incorrect and failed validation, see below:


<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "LocalBusiness",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "34 Structured Street",
    "streetAddress": "Semantic City",
    "streetAddress": "Markup County",
    "streetAddress": "SEO 123",
    "streetAddress": "United Kingdom"
  }
}
</script>

Invalid markup

The Address should have had separate address fields, but of course that is more difficult to build, especially for different country address formats.

However, it should of course have been like this:


<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "LocalBusiness",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "34 Structured Street",
    "addressLocality": "Semantic City",
    "addressRegion": "Markup County",
    "postalCode": "SEO 123",
    "addressCountry": "United Kingdom"
  }
}
</script>

Valid markup

This was a very basic example, there are much more comprehensive but valid use cases I would suggest are as prone to error regardless of the format.

For something as simple as an address in the footer of every page, why duplicate the HTML as well, when it could be as simple as the image below:

An address in HTML and microdata validated in Schema.org

Even using Google's own tools it is still possible to markup the content with incorrect properties.

So "Less prone to errors", I am not so sure.

JSON-LD IN Head or Body?

Google Search Central has started to understand that many people follow any recommendations to the letter.

Here a user asks a pretty sensible question:

Is it possible to insert JSON structured data at the bottom of the <body> instead of the <head>? It seems to work fine for many websites.

And it's nice to hear a reasonable response:

JSON-LD Structured Data: Where to Insert in a Page?

With that in mind, when working with Razor we can conditionally inject the JSON-LD using @section as follows:


@section SchemaScripts {
  <script type="application/ld+json">
      {
        - MY JSON-LD HERE -
      }
  </script>
}

And render it just before the closing <body> tag as follows:


@if (IsSectionDefined("SchemaScripts"))
  {
    @await RenderSectionAsync("SchemaScripts", required: false)
  }

A nice solution when you are generating your JSON-LD on a case by case basis.

JSON-LD means we don't need to add everything as content

Adding non-visible content is possibly not in the spirit of many accessibility guidelines but Google takes a dim view of it as well.

Don't mark up content that is not visible to readers of the page. For example, if the JSON-LD markup describes a performer, the HTML body must describe that same performer.

After all Structured Data Markup is an enhancement that adds "context" to your "content"
In fact it is there to help your users find your content and at this point I hope you have enjoyed mine.

Dean Leigh

@deanleigh: deanleigh on @deanleigh
Dean Leigh: deanleigh on Dean Leigh
Dean Leigh: deanleigh on Dean Leigh
@deanleigh: @deanleigh@mastodon.social on @deanleigh
@deanleigh.bsky.social: deanleigh.bsky.social on @deanleigh.bsky.social
@deanleigh: @deanleigh on @deanleigh