StaticString, and how it works internally in Swift

StaticString, and how it works internally in Swift

StaticString is an interesting type in Swift. It's essentially nothing more than a String that can't be modified for the purposes of referencing static content inside your binary.

You can encounter StaticString in Swift when referencing source metadata like #file and #function, but you can also define one yourself by explicitly declaring it in a string literal:

let path: StaticString = #file // StaticString
let myStaticString: StaticString = "SwiftRocks!"

In short, this is an optimization trick. A StaticString is meant to represent text that is known at compile-time (and is not going to be modified), allowing you to save memory by not building the heap storage that a regular String would require.

You might already have a good idea of what this is going to look like. While a normal String will read the memory address of the original string in the binary and build the entire data structure around it, a StaticString just... stores that address:

public struct StaticString: Sendable {

  /// Either a pointer to the start of UTF-8 data, represented as an integer,
  /// or an integer representation of a single Unicode scalar.
  @usableFromInline
  internal var _startPtrOrData: Builtin.Word

  ...
}

This makes perfect sense -- if you're not going to modify that string, we don't really need to do anything with that address. Every string literal you write is stored in the binary in the end (which you can even reverse-engineer to extract other people's API keys and such, yuck), and a StaticString is simply a wrapper of a type that reads that address.

But how does Swift differentiate between regular strings versus static ones?

How StaticString is built in the compiler

In Swift, literals are syntax-sugars for types that implement the ExpressibleBy series of protocols, so this wouldn't be different for StaticString. We already covered the topic of ExpressibleBy here on SwiftRocks, so to avoid duplicating information, make sure to familiarize yourself with that article before continuing this one.

As mentioned in that article, types that empower string literals are in reality types that inherit from the ExpressibleByStringLiteral protocol, exposing an initializer that receives a String formed from that literal. StaticString also works by inheriting from that protocol, but I was confusing about something: If ExpressibleByStringLiteral gives you a normal String, doesn't this ruin the purpose of a static string?

It turns out that I was missing an important point about string literals. ExpressibleByStringLiteral doesn't simply give you a String, you can actually customize it!

public protocol ExpressibleByStringLiteral {

  /// A type that represents a string literal.
  associatedtype StringLiteralType: _ExpressibleByBuiltinStringLiteral

  init(stringLiteral value: StringLiteralType)
}

When inheriting from ExpressibleByStringLiteral, you can receive anything that inherits _ExpressibleByBuiltinStringLiteral, which is a protocol that defines an object that can build a string from its original memory address:

public protocol _ExpressibleByBuiltinStringLiteral {

  init(
      _builtinStringLiteral start: Builtin.RawPointer,
      utf8CodeUnitCount: Builtin.Word,
      isASCII: Builtin.Int1
 )
}

As dictated by the underscore, this is an internal protocol that you shouldn't be messing with. The practice is quite interesting because you can actually inherit it, but the code won't compile because we cannot access Builtin types from Swift. But if we cannot create conformances to it, what can we use? The answer: String and StaticString.

These two types conform not only to ExpressibleByStringLiteral, but also to the protocol that defines how these strings are created in the first place. And while a String will implement it in order to create a proper mutable string object, a StaticString just stores the address.

extension StaticString: _ExpressibleByBuiltinStringLiteral {
  public init(
    _builtinStringLiteral start: Builtin.RawPointer,
    utf8CodeUnitCount: Builtin.Word,
    isASCII: Builtin.Int1
  ) {
    self = StaticString(
      _start: start,
      utf8CodeUnitCount: utf8CodeUnitCount,
      isASCII: isASCII)
  }
}

When building a string literal, the compiler reads the StringLiteralType type used by the ExpressibleByStringLiteral conformance to make sure the right string type is created and provided.

literalType = ctx.Id_StringLiteralType;

literalFuncName = DeclName(ctx, DeclBaseName::createConstructor(),
                           {ctx.Id_stringLiteral});

builtinProtocol = TypeChecker::getProtocol(
    cs.getASTContext(), expr->getLoc(),
    KnownProtocolKind::ExpressibleByBuiltinStringLiteral);
builtinLiteralFuncName =
    DeclName(ctx, DeclBaseName::createConstructor(),
             {ctx.Id_builtinStringLiteral,
              ctx.getIdentifier("utf8CodeUnitCount"),
              ctx.getIdentifier("isASCII")});

Should I be using StaticString?

As is the norm with micro-optimizations, unless you know what you're doing, probably not. You should also note that StaticStrings have a couple of limitations when it comes to Unicode, so you should be careful when trying to read their internal content.