Skip to main content

Predefined list of tags in Docusaurus

· 4 min read
Stephan Hochdörfer
Head of IT Business Operations

Since mid of 2022, we've been using Docusaurus as blogging platform. In general, we are happy Docusaurus users, although it comes with a few limitations due to the nature of a static site generator.

One limitation was the tag handling. Any blog post can contain any tag, and there is no easy way of figuring out which tags exist and how they are spelled. Over time, this resulted in a lot of duplicate tags that needed to be cleaned up.

Luckily, the recent Docusaurus 3.4 release comes with a fix for this: the ability to manage tags in a central location (a YAML file) and the ability to ensure that a build can break if a tag is used that is not part of the YAML file.

How does it work? First, we have to enable the feature by configuring a YAML file in the docusaurus.config.js file:

presets: [
[
'@docusaurus/preset-classic',
/** @type {import('@docusaurus/preset-classic').Options} */
({
blog: {
tags: 'tags.yml'
}
})
]
]

By default, Docusaurus will now issue warning messages during the build when tags are used in blog posts but are not defined in the specified tags.yml file. If you want the build to break, additionally set onInlineTags: 'throw'.

In the simplest form, the tags.yml file contains the list of available tags structured like this:

tag1:
tag2:
tag3:

To add a specific label or description to a tag that influences how the tag is rendered in the frontend, you can optionally also define labels and descriptions for each tag:

tag1:
label: Tag 1
description: This is some additional information for Tag 1
tag2:
label: Tag 2
description: This is some additional information for Tag 2
tag3:
label: Tag 3
description: This is some additional information for Tag 3

Now that we have configured everything, how can extract the existing tags from several hundred blog posts?

Since I am the most fluent in PHP and since I've used the league/commonmark package initially when porting our Silverstripe blog posts over to Docusaurus, I gave the library a try again as it also supports parsing front matter configuration.

First, let's install the required Composer dependencies:

composer require league/commonmark symfony/yaml

Parsing front matter configurations from a Markdown content is rather trivial:

<?php

require __DIR__ . '/vendor/autoload.php';

use League\CommonMark\Environment\Environment;
use League\CommonMark\Exception\CommonMarkException;
use League\CommonMark\Extension\CommonMark\CommonMarkCoreExtension;
use League\CommonMark\Extension\FrontMatter\FrontMatterExtension;
use League\CommonMark\Extension\FrontMatter\Output\RenderedContentWithFrontMatter;
use League\CommonMark\MarkdownConverter;

// Set up the Markdown converter
$config = [];
$environment = new Environment($config);
$environment->addExtension(new CommonMarkCoreExtension());
$environment->addExtension(new FrontMatterExtension());
$converter = new MarkdownConverter($environment);

try {
$markdown = '' // here the Markdown file;
$result = $converter->convert($markdown);
if ($result instanceof RenderedContentWithFrontMatter) {
$frontMatter = $result->getFrontMatter();
if (isset($frontMatter['tags']) && is_array($frontMatter['tags'])) {
$tags = array_merge($tags, $frontMatter['tags']);
}
}
} catch (CommonMarkException $e) {
}

What's left is to iterate over all Markdown files in our blog repository, parse each file, extract the tags, filter them, and dump them in a YAML file. The complete script I used looks like this:

<?php

require __DIR__ . '/vendor/autoload.php';

use League\CommonMark\Environment\Environment;
use League\CommonMark\Exception\CommonMarkException;
use League\CommonMark\Extension\CommonMark\CommonMarkCoreExtension;
use League\CommonMark\Extension\FrontMatter\FrontMatterExtension;
use League\CommonMark\Extension\FrontMatter\Output\RenderedContentWithFrontMatter;
use League\CommonMark\MarkdownConverter;

$baseDir = ''; // point to the directory that contains all your Markdown source files

// Here we collect all found tags
$tags = [];

// Set up the Markdown converter
$config = [];
$environment = new Environment($config);
$environment->addExtension(new CommonMarkCoreExtension());
$environment->addExtension(new FrontMatterExtension());
$converter = new MarkdownConverter($environment);

// Set up iterators for file access
$directoryIterator = new RecursiveDirectoryIterator($baseDir);
$recursiveIterator = new RecursiveIteratorIterator($directoryIterator);

foreach($recursiveIterator as $file) {
/* @var \SplFileInfo $file */
if (!$file->isFile()) {
continue;
}

if (strpos($file->getFilename(), '.md') === false) {
continue;
}

try {
$markdown = file_get_contents($file->getPathname());
$result = $converter->convert($markdown);
if ($result instanceof RenderedContentWithFrontMatter) {
$frontMatter = $result->getFrontMatter();
if (isset($frontMatter['tags']) && is_array($frontMatter['tags'])) {
$tags = array_merge($tags, $frontMatter['tags']);
}
}
} catch (CommonMarkException $e) {
}
}

$tags = array_unique($tags);
sort($tags);

$tagsContent = '';
foreach ($tags as $tag) {
$tagsContent .= $tag . ":\n";
$tagsContent .= " label: " . $tag . "\n";
}

file_put_contents('tags.yml', $tagsContent);