The urge to overengineer something recently befell me, and the target of that urge was the way Tusker handles HTML parsing. Mastodon provides the content of posts (and profile descriptions, and some other things) as HTML, which means that, in order to display them properly, you need to parse the HTML, a notoriously straightforward and easy task. For a long time, the approach I took was to use SwiftSoup to parse the HTML and then walk the resulting node tree to build up a big NSAttributedString
that could be displayed.
And while that technique worked perfectly well, there were a number of downsides that I wasn’t entirely satisfied with. First: SwiftSoup alone is a big dependency, at around ten thousand lines of code. Since it’s shipping in my app, I take the view that I’m responsible for making sure it doesn’t break—and that’s a lot of surface area to cover (moreover, since it was ported from a Java library, the code is extremely un-Swift-y). Second: it’s not especially fast. Partly because of its aforementioned nature as a Java port (everything is a class, lots of unnecessary copies), but primarily because it’s just doing more than I need it to (the first rule of optimization is do less work): it was parsing a string into a tree and then I was flattening that tree back into a string.
The last time I needed to do something similar, I used lol-html, Cloudflare’s streaming HTML rewriter. But, while that worked (and is still in use for my RSS reader app), it’s written in Rust, and I didn’t want to introduce significant additional complexity to the build process for Tusker. So, writing a new HTML parser in Swift seemed like a great idea.