Adding Intelligent Code Generation to Swift Projects with SourceKit

Adding Intelligent Code Generation to Swift Projects with SourceKit

I gave a talk about scalable iOS apps at SwiftHeroes 2021 in which I speak about an app's four "levels" of complexity, ending by stating that we still don't know what the "fifth level" would be.

I do have one guess though: I believe a "level five" iOS app will be an app that contains so many modules and architectural components that any small addition will require writing massive amounts of boilerplate code. In an app like this, the jump from level four to level five would be defined by the app's ability to generate most of its boilerplate and requirements.

Note that I'm not talking about code generation in the sense of tools like Sourcery where you define templates for common actions, push an input and get an output -- I'm talking about generating code for things you don't even know what the inputs are. Yes, that's possible! I've been playing with "intelligent code generation" for a couple of years now, and I'd like to show you how to achieve this with SourceKit.

Example: Generating a dependency injection library's dependency list

For this article, I'd like to use RouterService as an example. This is a type-safe dependency injection library in which your app is defined as a series of "features" that can depend on each other, creating a dependency graph that is then fed into a main RouterService object that does all the magic:

struct ProfileFeature: Feature {

    @Dependency var client: HTTPClientProtocol

    func build(fromRoute route: Route?) -> UIViewController {
        return ProfileViewController(client: client)
    }
}

There's one thing that is slightly annoying about it though -- because type-safety is an important aspect of the library, you need to manually inform RouterService of all available features in your app:

routerService.register(feature: ProfileFeature.self)
routerService.register(feature: SomeOtherFeature.self)

(The actual registration is a bit different from this, but I simplified it too avoid skewing the subject too much.)

This means that every time a new feature is added, the pull request must also contain a change to the file where these registrations are being made. If you forget to do that, the library will crash when attempting to reference that feature.

We can improve this by automating this step with code generation. If you're maintaining a list of features somewhere in a file you could use Sourcery to generate this boilerplate, but what I'd like to show you is how to generate this code without having any prior information on the app's feature list. No special yaml files, no special calls, just run a tool and get this code generated.

Introducing SourceKit

SourceKit is Swift's syntax highlighting engine. Well, I suppose technically the engine is Swift itself, but SourceKit is how Xcode is able to implement all of its Swift IDE features like formatting, jumping to specific symbols, and of course, the syntax highlighting itself. Part of the Swift umbrella of tools, SourceKit is a C library that abstracts the Swift compiler, and it's shipped inside the Swift toolchain when you download Xcode. This means that you don't need to download a special binary to follow this tutorial -- if you have Xcode, you also have SourceKit.

Like Swift, SourceKit is open-source, and you can use it outside of Xcode to give any project the capabilities of a Swift IDE. SourceKit-powered projects were actually sort of common a couple of years ago, with the most popular one being JP Simard's SourceKitten framework which allowed you to use SourceKit directly in Swift. In fact, many popular code-related tools like SwiftLint, Jazzy and even Sourcery itself are still using SourceKitten under the hood. I haven't seen any more SourceKit-related projects being developed in recent times, but you can still use it to create intelligent projects.

SourceKit works through a request/response format. The tool can receive many types of IDE-related requests like opening files, indexing content, auto-completion, looking up the definition of a symbol, formatting and so on, and by sending a structured request object you'll get a structured response with the result of what you're asking for.

You can actually see SourceKit in action by launching Xcode with SourceKit logging enabled:

export SOURCEKIT_LOGGING=3 && /Applications/Xcode.app/Contents/MacOS/Xcode > log.txt

With the SOURCEKIT_LOGGING flag, Xcode will start dumping every request done to SourceKit. Try doing some common actions like waiting for auto-complete and see how SourceKit makes it happen! The logs will contain a lot of noise, but if you want a pointer, search for calls to the editor_open request, which is the request done whenever you open a file:

final class TestService {
    init() {}
}
{
  key.request: source.request.editor.open,
  key.name: "/myFolder/myFile.swift",
  key.sourcefile: "/myFolder/myFile.swift"
}
{
  key.substructure: [
    {
      key.kind: source.lang.swift.decl.class,
      key.accessibility: source.lang.swift.accessibility.internal,
      key.name: "TestService",
      key.offset: 6,
      key.length: 35,
      key.nameoffset: 12,
      key.namelength: 11,
      key.bodyoffset: 25,
      key.bodylength: 15,
      key.attributes: [
        {
          key.offset: 0,
          key.length: 5,
          key.attribute: source.decl.attribute.final
        }
      ],
      key.substructure: [
        {
          key.kind: source.lang.swift.decl.function.method.instance,
          key.accessibility: source.lang.swift.accessibility.internal,
          key.name: "init()",
          key.offset: 30,
          key.length: 9,
          key.nameoffset: 30,
          key.namelength: 6,
          key.bodyoffset: 38,
          key.bodylength: 0
        }
      ]
    }
  ]
}

As you can see, the response to this request contains the tokenized structure of a Swift file. With it, we can determine that this file is declaring TestPrivate, that TestPrivate is a final private class, that final class are builtin Swift keywords, where they were defined, the length of each code block and so on. That's what Xcode uses to give keywords a special color before your file is properly indexed, and that's the most basic request it has!

In short, we can use this tokenized structure to automatically generate a feature's declaration code. If we know that a file contains the declaration of a feature, we can extract its name and automatically generate its setup boilerplate.

Creating a project with SourceKit

SourceKit is easy to use, but a bit annoying to configure. This is because we're dealing with a C library, so some setup needs to be done in your Xcode project before you can actually use it. I suppose that for simple endeavors you can use SourceKitten, but I'll teach you how to manually use SourceKit so you have access to its full capabilities.

To start, download this sample SourceKit-powered Swift Package Manager project that contains everything you need to get started.

The project contains two targets: Csourcekitd and MyProject, which is where the project itself resides. The reason we need to define a target for SourceKit is that although we have access to SourceKit in the toolchain, we don't know how to call it. This target has a header file that contains all functions supported by SourceKit, alongside the minimum setup necessary to abstract a C library into a Swift module.

Additionally, MyProject contains Swift abstractions that can handle initializing and using SourceKit. It defines for example the data structures that you need to pass, the constants for every relevant string in SourceKit (requests, keys and values), and small utilities on handling these types. If you're wondering where all of that comes from, all of these files come from Swift itself! The abstractions specifically were snatched from SourceKit-LSP, which is a Swift abstraction of SourceKit for IDEs that support the Language Server Protocols. Before continuing, I recommend you to take a quick glimpse at the contents of the files in the sample project so you can get familiarized with why they're there.

Extracting features and generating the boilerplate

If we assume that the purpose of this tool is to read the sources of a project, find Feature definitions and dump the generated code somewhere, we could define something like this:

let sourceKit = SourceKit()
let keys = sourceKit.keys!
let requests = sourceKit.requests!
let values = sourceKit.values!

func process(files: [String]) {
    var features = [String]()
    for file in files {
        features.append(contentsOf: findFeatures(inFile: file))
    }
    process(result: features)
}

func findFeatures(inFile file: String) -> [String] {
    return []
}

func process(result: [String]) {

}

process(files: []) // Add here the list of file paths you'd like to process

(I'm assuming you know how to provide the files array, but if you don't, you could search for .swift files with FileManager or receive an input directly with swift-argument-parser. For this tutorial, you could also simply hardcode a file path string.)

In short, the intention of this code is to iterate a files array, search for Feature declarations in each of them and report this result back to another method.

To implement findFeature(inFile:), let's start by defining a editor_open request to SourceKit:

func findFeature(inFile: String) -> [String] {
    let req = SKRequestDictionary(sourcekitd: sourceKit)

    req[keys.request] = requests.editor_open
    req[keys.name] = servicePath
    req[keys.sourcefile] = servicePath

    print(req)
    let response = sourceKit.sendSync(req)
    print(req)

    return []
}

Both the request and response objects are CustomStringConvertible, so printing them will show you all their details. If you run this for a given file, you'll see an output similar to the one I showed above. If you're wondering how I know which arguments to pass to the request, it's because I looked at how Xcode is calling it using the tips mentioned above. Unfortunately, I don't think there's any actual documentation for these requests besides checking the Xcode logs, but if you try to perform a request with something missing, SourceKit will tell you.

In short, what we need to do is traverse the tokens of a file and determine if it contains one or more enums that inherit from the Feature protocol. To figure out how to do that, it helps to feed SourceKit a file that matches our needs and see what SourceKit responds with:

enum MyFeature: Feature {}
{
  key.substructure: [
    {
      key.kind: source.lang.swift.decl.enum,
      key.accessibility: source.lang.swift.accessibility.internal,
      key.name: "MyFeature",
      key.offset: 0,
      key.length: 26,
      key.nameoffset: 5,
      key.namelength: 9,
      key.bodyoffset: 25,
      key.bodylength: 0,
      key.inheritedtypes: [
        {
          key.name: "Feature"
        }
      ],
      key.elements: [
        {
          key.kind: source.lang.swift.structure.elem.typeref,
          key.offset: 16,
          key.length: 7
        }
      ]
    }
  ]
}

From this response, we can determine that we need to:

  • Iterate key.substructure (recursively, because the declaration could be deeper down the structure)
  • Check if kind is source.lang.swift.decl.enum
  • Check if inheritedtypes contains Feature
  • If yes, record the name of the feature.

Luckily for us, the Swift abstraction of SourceKit has some methods that can help us with the above. This is how we can iterate key.substructure and check if the element represents the declaration of an enum:

var features = [String]()
response.recurse(uid: keys.substructure) { dict in
    let kind: SKUID? = dict[keys.kind]
    guard kind?.uid == values.decl_enum else {
        return
    }
}

With that out of the picture, we can check inheritedtypes by reading its value as a SKResponseArray:

guard let inheritedtypes: SKResponseArray = dict[keys.inheritedtypes] else {
    return
}
for inheritance in (0..<inheritedtypes.count).map({ inheritedtypes.get($0) }) {

}

And from each element in the array, we can check if its name is Feature and add it to our response array if true. This is how the full code looks like:

func findFeatures(inFile file: String) -> [String] {
    let req = SKRequestDictionary(sourcekitd: sourceKit)

    req[keys.request] = requests.editor_open
    req[keys.name] = file
    req[keys.sourcefile] = file

    let response = try! sourceKit.sendSync(req)

    var features = [String]()
    response.recurse(uid: keys.substructure) { dict in
        let kind: SKUID? = dict[keys.kind]
        guard kind?.uid == values.decl_enum else {
            return
        }
        guard let inheritedtypes: SKResponseArray = dict[keys.inheritedtypes] else {
            return
        }
        for inheritance in (0..<inheritedtypes.count).map({ inheritedtypes.get($0) }) {
            if let name: String = inheritance[keys.name], name == "Feature" {
                features.append(name)
            }
        }
    }

    return features
}

From here you could probably export this result and use Sourcery to generate the actual code, but since we already created a project for this I think it's easier to just generate the code yourself:

func process(result: [String]) {
    let declaration = result.map {
        "    routerService.register(feature: \($0).self"
    }.joined(separator: "\n")
    let result = """
    func registerFeatures(_ routerService: RouterService) {
    \(declaration)
    }
    """

    print(result)
    // Save this result to a file
}

In this example, we're generating a registerFeatures method that contains all the registration code inside of it. We could replace our current code with a call to this method, and setup a script that would run this tool every time a new service is added.

What about more complicated cases?

One thing you might've noticed is that this code doesn't take the module of the file into consideration, so this wouldn't work for apps with multiple modules, and we're also not doing any sort of caching. These however are problems that you can solve without using SourceKit directly, so we won't go into their details here. It should show to you however how this is a difficult thing to do properly and is why this should probably only be a concern of apps in a very advanced state.

In regards to the more advanced uses of SourceKit itself, one of my favorite requests is the index_source request which provides you the "indexed" version of your file. This is similar to the editor_open request, but instead of simply printing you the names of the tokens, it shows you the symbol of each reference. For example, if you're looking at a type reference like let type: MyType, SourceKit will tell you which module MyType belongs to, the name of the file in which it's defined, the specific line/column where the type is declared, and much more. This is what powers the "jump to definition" feature of Xcode, and is why the color of your files changes after a while -- Xcode has finished indexing it, and it now knows where each reference is coming from.

Unfortunately, these more advanced requests are also more difficult to use because their requests require you to pass your app's full list of compiler arguments. This would be easy if Xcode allowed you to export that information, but it doesn't, so you will have to mess with xcodebuild to figure out what these arguments are. If you want to see an example of a tool that actually went to the trouble of doing that, check out swiftshield. This is a tool that obfuscates Swift files, and it's completely based on SourceKit's index_sources request. I also think that looking at sourcekit-lsp is a good idea, as it contains implementations of many different SourceKit requests that you can take inspiration from.