Migration to Docusaurus
In our attempt to migrate from our Silverstripe powered blog to Docusaurus, we did not want to lose our old blog posts and thus needed to migrate them somehow. A manual migration was not an option for the roughly 320 blog posts we have written in the last 17 years.
Whilst checking how to automate the migration, I was looking for a solution to migrate HTML content to Markdown as the Sliverstripe blog post editor uses the TinyMCE editor which stores the content in HTML form in the database. I came across the league/html-to-markdown Composer package and gave it a try. The migration was nearly perfect and needed "just" a little manual clean-up. In the end, the manual clean-up was still quite annoying, but a lot quicker than manually processing the 320 blog posts by hand.
Since league/html-to-markdown is a Composer package, it can be installed like this:
composer require league/html-to-markdown
The migration from HTML code to Markdown can be done in a few lines of code:
use League\HTMLToMarkdown\HtmlConverter;
$html = 'The HTML source code here...';
$converter = new HtmlConverter();
$markdown = $converter->convert($html);
For the most part, the migration process worked like a charm. We had a few blog posts with leftover span elements, that I deleted quickly via my IDE (search and replace across all the files).
Additionally, I had a convert all the internal Silverstripe links into proper Markdown links to other blogs posts. And I went manually through all Markdown code elements and added proper syntax highlighting.
The migration script to spit out the Markdown files was written within an hour, the manual clean-up took about another 3 - 4 hours. Not too bad, I guess.