StaticString, and how it works internally in Swift
StaticString
is an interesting type in Swift. It's essentially nothing more than a String
that can't be modified for the purposes of referencing static content inside your binary.
You can encounter StaticString
in Swift when referencing source metadata like #file
and #function
, but you can also define one yourself by explicitly declaring it in a string literal:
let path: StaticString = #file // StaticString
let myStaticString: StaticString = "SwiftRocks!"
In short, this is an optimization trick. A StaticString
is meant to represent text that is known at compile-time (and is not going to be modified), allowing you to save memory by not building the heap storage that a regular String
would require.
You might already have a good idea of what this is going to look like. While a normal String
will read the memory address of the original string in the binary and build the entire data structure around it, a StaticString
just... stores that address:
public struct StaticString: Sendable {
/// Either a pointer to the start of UTF-8 data, represented as an integer,
/// or an integer representation of a single Unicode scalar.
@usableFromInline
internal var _startPtrOrData: Builtin.Word
...
}
This makes perfect sense -- if you're not going to modify that string, we don't really need to do anything with that address. Every string literal you write is stored in the binary in the end (which you can even reverse-engineer to extract other people's API keys and such, yuck), and a StaticString
is simply a wrapper of a type that reads that address.
But how does Swift differentiate between regular strings versus static ones?
How StaticString
is built in the compiler
In Swift, literals are syntax-sugars for types that implement the ExpressibleBy
series of protocols, so this wouldn't be different for StaticString
. We already covered the topic of ExpressibleBy here on SwiftRocks, so to avoid duplicating information, make sure to familiarize yourself with that article before continuing this one.
As mentioned in that article, types that empower string literals are in reality types that inherit from the ExpressibleByStringLiteral
protocol, exposing an initializer that receives a String
formed from that literal. StaticString
also works by inheriting from that protocol, but I was confusing about something: If ExpressibleByStringLiteral
gives you a normal String
, doesn't this ruin the purpose of a static string?
It turns out that I was missing an important point about string literals. ExpressibleByStringLiteral
doesn't simply give you a String
, you can actually customize it!
public protocol ExpressibleByStringLiteral {
/// A type that represents a string literal.
associatedtype StringLiteralType: _ExpressibleByBuiltinStringLiteral
init(stringLiteral value: StringLiteralType)
}
When inheriting from ExpressibleByStringLiteral
, you can receive anything that inherits _ExpressibleByBuiltinStringLiteral
, which is a protocol that defines an object that can build a string from its original memory address:
public protocol _ExpressibleByBuiltinStringLiteral {
init(
_builtinStringLiteral start: Builtin.RawPointer,
utf8CodeUnitCount: Builtin.Word,
isASCII: Builtin.Int1
)
}
As dictated by the underscore, this is an internal protocol that you shouldn't be messing with. The practice is quite interesting because you can actually inherit it, but the code won't compile because we cannot access Builtin
types from Swift. But if we cannot create conformances to it, what can we use? The answer: String
and StaticString
.
These two types conform not only to ExpressibleByStringLiteral
, but also to the protocol that defines how these strings are created in the first place. And while a String
will implement it in order to create a proper mutable string object, a StaticString
just stores the address.
extension StaticString: _ExpressibleByBuiltinStringLiteral {
public init(
_builtinStringLiteral start: Builtin.RawPointer,
utf8CodeUnitCount: Builtin.Word,
isASCII: Builtin.Int1
) {
self = StaticString(
_start: start,
utf8CodeUnitCount: utf8CodeUnitCount,
isASCII: isASCII)
}
}
When building a string literal, the compiler reads the StringLiteralType
type used by the ExpressibleByStringLiteral
conformance to make sure the right string type is created and provided.
literalType = ctx.Id_StringLiteralType;
literalFuncName = DeclName(ctx, DeclBaseName::createConstructor(),
{ctx.Id_stringLiteral});
builtinProtocol = TypeChecker::getProtocol(
cs.getASTContext(), expr->getLoc(),
KnownProtocolKind::ExpressibleByBuiltinStringLiteral);
builtinLiteralFuncName =
DeclName(ctx, DeclBaseName::createConstructor(),
{ctx.Id_builtinStringLiteral,
ctx.getIdentifier("utf8CodeUnitCount"),
ctx.getIdentifier("isASCII")});
Should I be using StaticString
?
As is the norm with micro-optimizations, unless you know what you're doing, probably not. You should also note that StaticStrings
have a couple of limitations when it comes to Unicode, so you should be careful when trying to read their internal content.