In our attempt to migrate from our Silverstripe powered blog to Docusaurus, we did not want to lose our old blog posts and thus needed to migrate them somehow. A manual migration was not an option for the roughly 320 blog posts we have written in the last 17 years.
Whilst checking how to automate the migration, I was looking for a solution to migrate HTML content to Markdown as the Sliverstripe blog post editor uses the TinyMCE editor which stores the content in HTML form in the database. I came across the league/html-to-markdown Composer package and gave it a try. The migration was nearly perfect and needed "just" a little manual clean-up. In the end, the manual clean-up was still quite annoying, but a lot quicker than manually processing the 320 blog posts by hand.
Since league/html-to-markdown is a Composer package, it can be installed like this:
composer require league/html-to-markdown
The migration from HTML code to Markdown can be done in a few lines of code:
use League\HTMLToMarkdown\HtmlConverter;
$html = 'The HTML source code here...';
$converter = new HtmlConverter();
$markdown = $converter->convert($html);
For the most part, the migration process worked like a charm. We had a few blog posts with leftover span elements, that I deleted quickly via my IDE (search and replace across all the files).
Additionally, I had a convert all the internal Silverstripe links into proper Markdown links to other blogs posts. And I went manually through all Markdown code elements and added proper syntax highlighting.
The migration script to spit out the Markdown files was written within an hour, the manual clean-up took about another 3 - 4 hours. Not too bad, I guess.