Using lol-html (or any Rust crate) in Swift

I recently started building a new iOS app and found myself with a need to parse HTML in order to extract some information. My goto tool for this in the past has been SwiftSoup. In this app, I have to deal with larger documents than I’d used it for previously, and unfortunately, its performance leavse something to be desired. Much of the issue comes from the fact that I only want to extract the first paragraph of a document, but SwiftSoup always needs to parse the entire thing—for large documents, potentially a lot of unnecessary work[1]. And, as far as I could find, there are no streaming HTML parsers written in Swift. One I did find, however, was CloudFlare’s lol-html. It’s specifically designed for speed and low latency, exactly what I want. But it’s written in Rust.

Getting a Rust library compiled into a form that it could be used from Swift didn’t turn out to be as complicated as I expected, but both Apple Silicon and Mac Catalyst introduced fun wrinkles.

This blog post from Mozilla was helpful in getting started, but things have changed somewhat in the years since it was written.

The first thing you need to do is install the appropriate targets for the Rust toolchain to let it build targeting iOS devices.

aarch64-apple-ios works for all actual devices. Additionally, to build for the iOS Simulator, you also need the aarch64-apple-ios-sim target (if you’re on Apple Silicon) or x86_64-apple-ios (for Intel Macs).

$ rustup target add aarch64-apple-ios aarch64-apple-ios-sim x86_64-apple-ios

To build a Rust project, you need a library crate with the crate-type set to staticlib so that it can be statically linked by the iOS app. The Rust library also needs an API that’s callable from C, i.e. using #[no_mangle] and extern "C" (outside the scope of this post). Fortunately for me, lol-html already includes such an API.

Building for iOS is done by running cargo build with the appropriate --target option. For example:

$ cargo build --release --target aarch64-apple-ios-sim

With the Rust library built, the next step is configuring Xcode to use it. In the app target’s build settings, the Header Search Paths needs to be updated to include the path to the C headers that correspond to the C-API that the Rust library exposes. In my case, that’s lol-html/c-api/include/.

That’ll get it working if you want to call it from C or Objective-C code in your project. To make it accessible from Swift, you need to add a bridging header that imports whatever library headers you need. For lol-html, there’s only lol_html.h. This will make Rust library’s functions directly accessible from all of the app target’s Swift files.

This is enough to compile it successfully, but to actually link the output into a runnable app, the we need to tell the linker where to find the library.

With a normal library, you could add the static library .a to the Xcode target’s “Frameworks, Libraries, and Embedded Content”. But, because Cargo puts the build products in separate directories depending on which platform it targets (e.g., target/aarch64-apple-ios/release/liblolhtml.a), we need to do it slightly differently. Just adding one of the liblolhtml.a’s to the Xcode target would make the linker try to always link against that specific one regardless of which platform the iOS app is building for. Instead, I modified the “Other Linker Flags” build setting to include -llolhtml and then update the “Library Search Paths” settings on a per-platform basis to tell it 1) that it needs to link against something called liblolhtml.a and 2) where exactly that file can be found.

Configuring the Library Search Paths build setting is kind of annoying, because the Xcode UI doesn’t fully match what the .pbxproj file can actually describe. Clicking the plus button next to a build setting in Xcode lets you pick for which SDKs the setting value applies. But we also need to narrow that down to specific architectures, because the Intel and Apple Silicon simulator builds need different versions of the library.

The easiest way I’ve found to do this is to go into the Build Settings tab of the Xcode target, find Library Search Paths, expand it, and click the little plus button next to each of Debug and Release. (If you click on the “Any Architecture | Any SDK” dropdown, you’ll see what I mean about not being able to actually specify the architecture from the UI.)

The Library Search Paths setting in Xcode showing empty values under Debug and Release

Then, open the project.pbxproj file in a text editor. I recommend closing the Xcode proejct before making any changes to this file. Search for the newly added line starting with "LIBRARY_SEARCH_PATHS[arch=*]" and replace it with the following. There will be two occurrences of that line (for the debug and releaes configurations) and both need to be replaced.

"LIBRARY_SEARCH_PATHS[sdk=iphoneos*]" = "$(PROJECT_DIR)/lol-html/c-api/target/aarch64-apple-ios/release/";
"LIBRARY_SEARCH_PATHS[sdk=iphonesimulator*][arch=arm64]" = "$(PROJECT_DIR)/lol-html/c-api/target/aarch64-apple-ios-sim/release/";
"LIBRARY_SEARCH_PATHS[sdk=iphonesimulator*][arch=x86_64]" = "$(PROJECT_DIR)/lol-html/c-api/target/x86_64-apple-ios/release/";

You’ll need to substitute the lol-html/c-api part for the actual path to the library you’re using. This will tell Xcode to to use the aarch64-apple-ios version for all actual iOS device targets, and the appropriate simulator version depending on the architecture.

After that, you should be able to re-open the project in Xcode and see all the configurations you added in Build Settings.

The Library Search Paths setting in Xcode showing values for any iOS SDK, and for arm64 and x86_64 variants of the simulator SDK

With that, you should be able to use the Rust library from your Swift code and successfully build and run your app in both the simulator and on a real device.

Mac Catalyst

My first attempt at getting Catalyst builds to work was just by using the normal Mac targets for the Rust library (e.g., aarch64-apple-darwin). But that results in a link error when Xcode builds the app, because it considers binaries built for Catalyst to be distinct targets from regular macOS.

The separate Rust targets for Catalyst are aarch64-apple-ios-macabi and x86_64-apple-ios-macabi for ARM and Intel respectively. As of writing, these are tier 3 targets, which means the Rust project doesn’t provide official builds. This, in turn, means to use them you have to build the standard library from source yourself.

Doing so requires a Rust Nightly feature, build-std, to let Cargo include the standard library in the crate graph for compilation. So, with Nightly installed (rustup toolchain install nightly) and the std source downloaded (rustup component add rust-src --toolchain-nightly), you can run the following command to build for a specific target with the standard library built from source:

$ cargo +nightly build -Z build-std=std,panic_abort --release --target aarch64-apple-ios-macabi

This separate set of platform/arch combinations requires another set of additions to the Xcode project file, in the sample place as before:

"LIBRARY_SEARCH_PATHS[sdk=macosx*][arch=arm64]" = "$(PROJECT_DIR)/lol-html/c-api/target/aarch64-apple-ios-macabi/release";
"LIBRARY_SEARCH_PATHS[sdk=macosx*][arch=x86_64]" = "$(PROJECT_DIR)/lol-html/c-api/target/x86_64-apple-ios-macabi/release";

With that added, the build setting should have values configured for iOS, ARM and Intel Simulators, and ARM and Intel Catalyst:

The Library Search Paths setting in Xcode showing values for all platform and architecture combinations

I initially thought handling universal builds (i.e., combining arm64 and x86_64 into one binary) of the Catalyst app would be complicated, like I would have to lipo them together myself, but it turned out to be entirely painless. Just having the built static libraries for both architectures present in their expected locations is enough. Xcode’s build process takes care of linking each architecture of the app with the respective version of the Rust library and then combining those into one universal package.

Build Script

Keeping track of all of those Rust build targets and making sure to rebuild the right ones if anything changes is rather annoying, so I wrote a little script for Xcode to run to take care of it.

It uses the environment variables provided by Xcode to figure out which platform and architecture(s) are being targeted and build the appropriate Rust targets.

pushd "$PROJECT_DIR/lol-html/c-api/"

build() {
    echo "Building lol-html for target: $1"

    ~/.cargo/bin/cargo build --release --target $1
}

build_std() {
    echo "Building lol-html with std for target: $1"
    
    ~/.cargo/bin/cargo +nightly build -Z build-std=panic_abort,std --release --target $1
}

if [ "$PLATFORM_NAME" == "iphonesimulator" ]; then
    if [ "$ARCHS" == "arm64" ]; then
        build "aarch64-apple-ios-sim"
    elif [ "$ARCHS" == "x86_64" ]; then
        build "x86_64-apple-ios"
    else
        echo "error: unknown value for \$ARCHS"
        exit 1
    fi
elif [ "$PLATFORM_NAME" == "iphoneos" ]; then
    build "aarch64-apple-ios"
elif [ "$PLATFORM_NAME" == "macosx" ]; then
    if grep -q "arm64" <<< "$ARCHS"; then
        build_std "aarch64-apple-ios-macabi"
    fi
    if grep -q "x86_64" <<< "$ARCHS"; then
        build_std "x86_64-apple-ios-macabi"
    fi
else
    echo "error: unknown value for \$PLATFORM_NAME"
    exit 1
fi

One thing to note is that when building the universal Mac target, $ARCHS has the value arm64 x86_64. So I check whether the string contains the target architecture, rather than strictly equals, and don’t use elif in the Mac branch so that both architectures are built.

I have it configured to not bother with any of the dependency analysis stuff, because Cargo takes care of only actually rebuilding if something’s changed and when nothing has, the time it takes is negligible so running on every incremental build is fine.

With the script added to Build Phases in Xcode (for some reason it needs to come not just before Link Binary with Libraries but also before Compile Sources), I can run cargo clean in the Rust project directory and then seamlessly build and run from Xcode.


1.

I benchmarked it, and for an average-length document, using lol-html to extract the first paragraph winds up being two orders of magnitude faster than SwiftSoup. And that dramatic difference only increases for longer documents.

Comments

Comments powered by ActivityPub. To respond to this post, enter your username and instance below, or copy its URL into the search interface for client for Mastodon, Pleroma, or other compatible software. Learn more.

Reply from your instance: