by Chriztian Steinmeier, posted on Dec 20, 2021

Heads Up!

This article is several years old now, and much has happened since then, so please keep that in mind while reading it.

At the time we had a student developer at the office and I tasked him with the job of figuring out what was needed to accomplish something like this. It didn't take long for him to find a couple of lines of JavaScript that did the trick, basically something like this:

function getReadingTimeInMinutes(article) {
  const text = article.innerText
  const wpm = 225
  const words = text.split(/\s+/).length
  const minutes = Math.ceil(words / wpm)
  return minutes
}

A JavaScript function to calculate reading time for a DOM element

On the surface level this looks like it's enough, right? I mean:

It does the job
It's progressively enhancing the article (if your browser doesn't do JavaScript, you're not losing any functionality - you're just not getting a heads-up on the time you'll need to spend reading)

There's always more

Having done this job for many years now, I know that when a client asks for something, there's usually a lot of hidden (but implied) extras compiled into that single remark of "Oh, and it should show the reading time on the article".

This was no different of course, as the blog section had a front page with a cards style layout, where the latest articles were featured, and I was already willing to bet my entire month's salary that they'd want the reading time displayed there as well, since that's the place where someone actually decides if they should click into and read the article.

“Why is that a problem?”, asked the student.

I'll let you think about it too, for few seconds - then continue reading...

Not enough data

When we're rendering the teaser card for an article, we're only rendering a couple of properties (typically the title, a teaser and the author's name and/or photo) - not the full article. So our getReadingTimeInMinutes() function would (at most) report back that this card could be read "in about a minute". Not what we wanted.

So at this point we start bouncing ideas for how we could handle this scenario, all of which seem to be one of two approaches in disguise:

Put the article's text inside a hidden <div> so it can be read from the script?
Do an Ajax request back to the server to get the article's content and calculate on that?

While they are certainly both viable options, there's no way I'd OK stuffing the entire article text inside a hidden <div>, just to get a "Reading time" number.
For both options it's worth mentioning that it's not even as simple as it sounds, because the articles themselves are built with a block builder, so the actual content is spread across several different blocks (aka tons of JSON) which, when rendered on the article page, are handled by each of their own partials. So not so straight-forward, actually.

The way that could possibly work

So of course we arrived at the realization that the simple (and very Umbraco-y) solution would be to have a Reading Time property on the document type that we could render anywhere needed. This would also make it available to everyone - not just browsers with JavaScript enabled. But how and when should this property be calculated then?

Having the client fill it in was obviously an option - but it should be the last resort - a better way would be to have the property be read-only (i.e. a "Label" type) and then see if we could update it from code at some point?

Turns out you can do exactly that, since Umbraco throws various events during a content item's lifecycle and if we hook into the ContentService.Saving event, we have access to the article being saved, thus allowing us to update its data before saving.

But how?

The thing is - how do we actually do this? On the server we can't just use the JavaScript from earlier, and even if we rewrite it in C#, we're still missing a way to get the raw text of the article from all of its blobs of JSON... 🤔

What we really want to use is the rendered article, because writing a secondary rendering (i.e. an API endpoint or similar) just to get the article's content means that every time we add a new block to use on an article, we'd need to remember to also write the alternate rendering, which is just asking for trouble.

So we started digging, and lo and behold - there is a way to render a piece of content, that does so using the assigned template and layouts etc. — It's called RenderTemplate() and it's available on the UmbracoHelper (this was an Umbraco 7 site but I'd be surprised if the same possibility isn't still available in 8 & 9).

So this is what we ended up with:

namespace TwentyFour {
	public class Startup : ApplicationEventHandler {
		protected override void ApplicationStarted(UmbracoApplicationBase umbracoApplication, ApplicationContext applicationContext) {
			ContentService.Saving += (sender, args) => {
				var entities = args
					.SavedEntities
					.Where(x => x.ContentType.Alias == "ArticlePage")
					.ToList();
				
				if (entities.Any()) {
					try {
						UmbracoHelper umbracoHelper = new UmbracoHelper(UmbracoContext.Current);
						
						foreach (var entity in entities) {
							var renderedArticle = umbracoHelper.RenderTemplate(entity.Id);
							var minutes = Helpers.GetReadingTimeFromRenderedArticle(renderedArticle);
						
							entity.SetValue("readingTimeInMinutes", minutes);
						}
						
						sender.Save(entities, raiseEvents: false);
						
					} catch (Exception ex) {
						LogHelper.Error<ContentService>("Error while updating reading time", ex);
					}
				}
				
			};
		}
	}
}

The event handler used to update the `readingTimeInMinutes` property on an article when it's saved

The GetReadingTimeFromRenderedArticle() looks something like this:

public static int GetReadingTimeFromRenderedArticle(IHtmlString articleHtml) {
	var text = ExtractText(articleHtml.ToString());
	
	var wordsPerMinute = 222m;
	var separators = new char[]{ ' ' };
	var words = text.Split(separators, StringSplitOptions.RemoveEmptyEntries).Length;
	var minutes = (int)Math.Ceiling(words / wordsPerMinute);
	
	return minutes;
}

The C# version of the initial JavaScript function to calculate reading time for an article

and this uses another custom helper - ExtractText where the HtmlDocument stuff is from the HtmlAgilityPack:

public static string ExtractText(string html) {
	if (html == null) {
		return "";
	}
	
	HtmlDocument doc = new HtmlDocument();
	doc.LoadHtml(html);
	
	var chunks = new List<string>(); 
	var nodes = doc.DocumentNode.SelectNodes(".//main//*[not(self::style)]/text()");
	
	foreach (var item in nodes) {
		if (item.NodeType == HtmlNodeType.Text) {
			if (item.InnerText.Trim() != "") {
				chunks.Add(item.InnerText.Trim());
			}
		}
	}
	return String.Join(" ", chunks);
}

So there you have it — we got a request for a feature; had an initial idea for it - found out it didn't solve all the necessary use-cases, and then revised it to handle the updated scenario(s).

Please don't hesitate to throw any questions my way - or if you think we're "doing it wrong" and have alternate ways to do something similar.

Thanks for reading and have a fantastic holiday!

Chriztian Steinmeier

Chriztian is on Twitter as @greystate