A UI Framework Dilemma

Published on Oct 5th, 2020, in swift, gemini. 13 minute read.

For the past couple of weeks I’ve been building Rocketeer, an iOS browser for the Gemini network.^[1][1]1. I previously wrote about building the networking stack using Network.framework. The gemtext format is very minimal, so I thought it would be fairly easy to build something to render Gemini documents. The format is line-oriented and only allows a few different line types. There are regular paragraphs, link lines, 3 levels of headings, unordered list items, preformatted blocks, and block quotes. All of these are pretty simple, visually speaking, and the layout is also straightforward. So, I expected to be able to build a renderer quite easily. Unfortunately, there turned out to be lots of little details that were not so obvious at first and introduced a bunch of complications.

Initially, all of the UI code for Rocketeer was written using SwiftUI (and so it remains in the current (at the time of publication) public beta). Originally I chose SwiftUI because it allowed me to build a functional document renderer incredibly quickly, so I could at least see something on screen. But, this worked out better, architecturally speaking, than I expected. Gemtext is a line-oriented format, so each line of text in the source document pretty much maps to a single visual unit for the display purposes. For the same reason, layout is quite simple. Just as the source document is a just list of lines arranged vertically, the rendered document is a list of display units arranged vertically, one after the other. With SwiftUI, the actual layout code is a simple as a single VStack (or LazyVStack on iOS 14, which gets you a simple layout with lazily initialized views without having to screw around with the table or collection view APIs that weren’t designed for such a thing) containing a ForEach that iterates over each of the display blocks. Put that inside a ScrollView, and bam—you’re rendering a whole Gemini document.

This was all well and good, until I realized there were a few features I wanted to add that SwiftUI wasn’t capable of. At first it was just custom context menu link previews (similar to what Safari does).

While SwiftUI does provide the .contextMenu modifier for adding context menu actions to a view, it doesn’t have a native API for creating custom context menu previews (FB8772786). This could in theory be accomplished with a custom UIViewRepresentable that wraps a SwiftUI view in a UIView with a UIContextMenuInteraction, thereby granting access to the regular UIKit context menu APIs, but that’s a great deal of work a feature that’s small enough it probably wouldn’t be missed.

But, that wasn’t the end. I realized text selection would be a very useful feature. Replies to gemlog posts are commonly written by block-quoting parts of the original post and then writing the response below it (à la email bottom-posting). Imagine being able to read a Gemini post on an iPad, copying parts of it into a text editor to compose a response, and then uploading it to your server with an FTP app. That sounds like a pretty comfy workflow. One which requires the ability to select and copy text out of the browser. Which SwiftUI’s Text view can’t do (FB8773449).

I put aside text selection as something to revisit later. And then I got to thinking about interesting features that the Gemini format itself would facilitate. The first one that came to mind was generating a table of contents. As a side of a document format that doesn’t allow custom layout, documents on Geminispace use markup much more structurally/semantically. Headings are used as, well, headings. And having documents be well-ordered in addition to having three distinct levels of headings, means there’s an structure implied by the heading lines. By scanning through a document and looking at the heading lines, you could quite easily generate a table of contents for the entire document. Now, here’s where SwiftUI comes into this: If you’ve got a table of contents, you probably want to be able to skip to a specific point in it (what use would the table of contents be in a book without page numbers?). iOS 14 introduced ScrollViewReader, which, allows the scroll view’s position to be manipulated programatically by jumping to specific views (despite the name, it does not do any reading of the ScrollView). Of course, this is only available on iOS 14, so any users on iOS 13 wouldn’t be able to use it. And given that iOS 14 was released less than a month ago, and how simple this feature seems, I didn’t want to make it dependent on a new OS version.

Also on the subject of scroll views, the browser should be able to persist the scroll view’s offset. If the user leaves the app and returns, they should be looking at the same point on the page as when they left. Likewise if they navigate backwards/forwards or switch tabs. This isn’t possible at all using SwiftUI’s ScrollView. I briefly tried setting up a UIScrollView myself, and then adding the UIHostingController’s view a child of it, but this completely removed the benefit of LazyVStack, causing abysmal performance when viewing some pages.

Even if wrapping UIScrollView myself had worked, what would be the point? Along with all the other things, I’d have written almost the entire Gemini document viewer using UIKit with only the teensiest bit of SwiftUI glue code. Why not then just go one step further and only use UIKit?

And this is entirely putting aside the fact that Rocketeer was originally intended to be a Mac app, with the iOS app as an afterthought when I realized it was easily possible since I was using SwiftUI. Using UIKit for so many integral parts would have meant huge portions of the codebase had to be rewritten for the macOS version.

So, while any one of these features wouldn’t be enough to get me to abandon SwiftUI, altogether, it’s enough to get me to start eyeing other options. Because to not do so would leave a lot of useful features on the table. The two likely replacements I came up with were: A) converting the Gemini document into an NSAttributedString and stuffing it into a UITextView or B) rendering the Gemini document to HTML and displaying it with a WKWebView. The following table is what what features I want for Rocketeer and with which options they’re possible.

	SwiftUI	UITextView	WKWebView
Render all Gemini line types	Yes	Yes	Yes
Text selection	No	Yes	Yes
Text selection between blocks	N/A	No	Yes
Context menu actions	Yes	Yes	Yes
Context menu previews	Hacky	Yes	Yes
VoiceOver & Voice Control	No (iOS bug)	?	Yes
Persist scroll position	No	Yes	Yes
Scroll to anchor	iOS 14	Yes	Yes
Horizontally scrolling blocks	Yes	No	Yes
SF Symbols	Yes	Yes	Hacky
System fonts	Yes	Yes	Hacky
Block quote leading border	Yes	No	Yes

Clearly, SwiftUI poses the most problems, and WebKit has the most possibilities. But plain old UIKit with a UITextView is in an annoying middle ground. A fair number of additional features are possible when compared to SwiftUI. But in exchange for that, it also loses some features that are possible with SwiftUI. And of course, there are still a few things that neither SwiftUI nor UITextView support.

First up: VoiceOver and Voice Control. While reading the contents of a text view with VoiceOver is obviously possible, there are still a few questions. The ideal narration behavior for Rocketeer would be to have VoiceOver reach each visual segment one at a time. One-by-one, going through each paragraph and link and list item^[2][2]2. Regardless of UI technology, narrating preformatted text with a screen reader is an interesting problem for the Gemini format. I can’t imagine listening to something naively read a block of code aloud would be pleasant. Let alone ASCII art, which is relatively common in Geminispace in lieu of inline images.. As for Voice Control, the user needs to be able to interact with links within the text view individually. And in addition to the bare numbers all buttons are assigned, users should be able to speak the label of links to simulate tapping on them. I would hope UIKit provides suitable accessibility APIs for this, but I haven’t investigated it. I can’t imagine it’s as simple as using a single Button per link in SwiftUI. With WKWebView, these are not only possible but are handled automatically and completely for free, thanks to all the work the WebKit team has put into it.

Then there’s the issue of styling block quotes. The appearance I prefer is having the text be a lighter color and italicized, as well as having it slightly inset from the leading edge and have a solid border along the leading edge as well. As is becoming a pattern, with SwiftUI, this is fairly straightforward. You can use an HStack with some spacing containing a Color that has a frame of a fixed with and then the Text. The text will force the stack to expand vertically, and the color view will expand to fill the entire available height. This is also possible with CSS, using only the border and padding properties. UITextView of course makes things more complicated. While there may be an NSAttributedString attribute to indent an entire paragraph, there is no good way of applying a border to just a small part of a text view’s contents. A solution could be devised, by adding UIViews with background colors as subviews of the text view. But that has to make sure the border views are positioned correctly, and that they’re kept in sync with the text view as the device is rotated or the window resized. I can also imagine a truly cursed solution that works by performing word wrapping at the desired with, and then inserting a newline character, a special Unicode character that renders as a solid block, and some spaces at each point where the text would wrap at the desired width. Even with the block characters correctly positioned horizontally, there would likely be small gaps in between them vertically due to the font’s line spacing. Furthermore, you would have to keep this in sync with viewport size changes, and at any rate, this is just too cursed of a solution for me.

On to the subject of preformatted text, for which the challenge is that line wrapping needs to be disabled. Otherwise, certain preformatted text, like code, would be much more difficult to read. And even worse, ASCII art would be entirely illegible (and potentially take up a huge amount of vertical space unnecessarily, depending on how wide it is). With line wrapping disabled, the preformatted text needs to scroll horizontally so that it is all visible. But the entire document viewport shouldn’t scroll because it’s likely that the majority of the text is just going to be regular paragraphs, and moving the entire viewport horizontally would leave those off the screen. So, only the actual preformatted sections should be able to scroll horizontally, everything else should be fixed to the width of the screen. With SwiftUI, this is pretty straightforward: there’s just a Text view inside a horizontal ScrollView and that takes care of everything. Using WebKit for this is also very straightforward, since you can use CSS to set the overflow-x property on <pre> elements to make them scroll. When you want to use UITextView is where this gets complicated. This isn’t possible just with an attributed string and a plain old text view. You could work around this by adding horizontal another UITextView that’s configured to disable line wrapping and allow scrolling on the X-axis as a subview of the outer text view. But then you once again would have to deal with manually positioning the inner text views inside of the scroll view content of the outer text view and keeping that position in sync outer view changes size. You also have to somehow add spacing to the contents of the outer text view so that there’s an appropriately sized gap in its contents where the inner text view sits. This approach would also introduce problems for text selection.

While UITextView does support at least some amount of text selection, which is an improvement over SwiftUI’s complete lack thereof, it doesn’t support selecting text between multiple separate text views. Most of the time, this isn’t a big deal. But what if you want to copy a large chunk of text spanning multiple paragraphs, and say, a preformatted block. That wouldn’t be possible. If you were inserting preformatted blocks using the technique described in the previous paragraph, what would happen when you tried to make a selection that crosses the boundary between a piece of preformatted text and regular body text? The selection certainly wouldn’t continue between them smoothly, as the user would expect. If you had to insert extra text into the outer text view’s contents in order to make space for the inner views, starting a selection in the outer view and dragging across the inner view would just end up selecting the placeholder characters you inserted, which are not actually part of the source document. And if the user started a selection in one of the inner text views, dragging across the boundary into the outer text view would result in the selection just stopping abruptly when it reached the end of the preformatted text. Inserting NSTextAttachments into the text as I previously described would also make the matter of selection more complicated. I use SF Symbols images as icons to show additional information about links (specifically, whether they’re pointing to the same domain, a different part of Geminispace, or another protocol altogether). NSTextAttachment can contain arbitrary UIImages, so this is possible, but it makes the image a part of the text, meaning the user could end up making a selection that contains an attachment and copying it out of the app. What would happen then, you wonder? I don’t know, but I can’t I imagine it would be something helpful. Bullet points have a similar problems, since the U+2022 character is inserted directly into the attributed string when rendering list item lines. WKWebView doesn’t have this problem, once again thanks to the efforts of the WebKit team. Text selection across multiple HTML elements? No problem. Skip over decorative images? Sure thing. Bullet points? You bet.

Having gotten this far, you might think that using a WKWebView with the gemtext converted into HTML is the perfect solution. But of course, there are a couple regressions when going from plain old UIKit to WebKit, since nothing could ever be simple.

The first is the issue of SF Symbols. Although each SF Symbol does have a character code allocated from a resaved section of Unicode, none of the system fonts accessible from the web view will render the symbol, so you’ll just end up with a box. The images (or SVGs) for individual SF Symbols can be extracted from system fonts, and the content of a WKWebView does theoretically have a way of accessing resources bundled with the app, so in theory they could be displayed. But who knows if that would get past App Review.

There’s a similar problem with fonts. I hadn’t mentioned it, but the font I used for both the SwiftUI and UITextView versions of this has been Apple’s New York, which is the system-provided serifed font. This is no problem for SwiftUI and UIKit, since their font classes both have methods for getting the system font of a certain design. But, as far as I can tell, these system fonts are not accessible from web content. Even using the internal name, .NewYork-Regular doesn’t work; it just falls back on the browser’s default font. A similar approach may be taken to the SF Symbols issue, since Apple does make their system fonts available for download on their developer website^[3][3]3. Say goodbye to the days of extracting SF Mono from inside Terminal.app just to use it inside other text editors.. The font could be bundled with the app and then loaded from the web content, though again, who knows how this would go over with App Review.

So, after all that, what am I going to do for Rocketeer. Well, from a customer perspective, the WKWebView solution is clearly the best since it both allows far more features and makes a number of others behave much more inline with the way you’d expect. But I’m kinda annoyed about it. This isn’t just a document viewer for some random format that I’m building. This is a browser for Gemini, a protocol and a format which are very intentionally designed to avoid the pitfalls and complexities of the web. But the most feature-complete way to build this is, because all the other available UI frameworks aren’t up to the (relatively simple) task, to pull in an entire web rendering engine. The very technology Gemini is trying to get away from. Isn’t that ironic.

I previously wrote about building the networking stack using Network.framework. ↩︎

Regardless of UI technology, narrating preformatted text with a screen reader is an interesting problem for the Gemini format. I can’t imagine listening to something naively read a block of code aloud would be pleasant. Let alone ASCII art, which is relatively common in Geminispace in lieu of inline images. ↩︎

Say goodbye to the days of extracting SF Mono from inside Terminal.app just to use it inside other text editors. ↩︎