How OptionSet works inside the Swift Compiler

How OptionSet works inside the Swift Compiler

Swift's OptionSet protocol is the bridged version of Objective-C's NS_OPTIONS enum -- a handy tool that allows you to very efficiently define a combination of "options" that can be used for a specific operation.

If for example we were developing a JSON serializer, it would be nice if we allowed the user to control how certain aspects of the serialization process happened, such as if the keys are sorted or if indentation should be applied to the final JSON. In fact, this is how Foundation's JSONSerialization works:

try JSONSerialization.data(withJSONObject: json, options: [.prettyPrinted, .sortedKeys])

From a "list" of options (the quotes will make sense later!), the OptionSet protocol contains a contains(_:) method to see if a specific option is inside the "list". Having several options passed together means that all these options should be considered, and passing no options can be done through an empty array.

But despite the looks, the options parameter is not an array but a WritingOptions struct that conforms to the OptionSet protocol. How the hell does it behaves like a Set even though it is not?!

If you've been following this blog, you'll know that the answer is a bunch of Swift compiler magic. Let's see what this protocol is, why OptionSets are preferred over a regular Set in this case, and finally what magic is done to provide the final behavior.

Context: Objective-C's NS_OPTIONS

Before analyzing the Swift compiler code that allows a regular struct to behave like a Set, let's see where OptionSet came from.

The Objective-C equivalent of OptionSet was the NS_OPTIONS macro, a glorified enum that allowed the same option-mixing gimmick. With it, you can define your options just like an integer enum, like these hypothetical options for browsing SwiftRocks articles:

typedef NS_OPTIONS(NSUInteger, SwiftRocksOptions) {
    DarkMode     = 1 << 0,
    NoJavaScript = 1 << 1,
    NoImages     = 1 << 2,
};

However, instead of using sequential numbers like 0, 1 and 2 in a regular enum, the idea is that you would use powers of two (easier represented by the number one 1 left-shifted (<<) by some value.):

If the usage of powers of two sounds confusing, consider how a three-bit number 0 looks like in binary:

0 0 0

In contrast, this is how a three-bit number 7 looks like in binary:

1 1 1

Since we have four bits and each of these bits can be zeroes or ones, what if we treated each bit as an option?

The first bit (from the right) could be the first option (DarkMode), the second bit the second option (NoJavaScript), and the third bit the third option (NoImage). if a bit is set to one, it means that option is "selected". This means that we could represent the selection of DarkMode as the number 1 (0 0 1), the selection of NoJavaScript as the number 2 (0 1 0), the selection of NoImage as the number 4 (1 0 0), and a potential combination of NoJavascript and NoImages as the number 6 (1 1 0).

The process of creating such numbers is generally referred to as bitmasking and is a very efficient way to treat the existence/combination of arbitrary values. While using a regular Array or Set would require us to store the actual elements in memory, the bitmasked requires just the memory space of the integer itself used to process the mask.

Adding and checking options in a mask is done through bitwise operators. The bitwise OR operator (|) returns a number that has a specific bit set to one if any of the numbers in the operation do so, so since we're using powers of two, using it on two options return a number that represents their combination:

NSUInteger options = SwiftRocksOptionsNoJavaScript | SwiftRocksOptionsNoImages
// 1 1 0 (6)

On the other hand, the bitwise AND (&) operator returns a number that has a specific bit set to one only if both numbers in the operation do so. If we use it with the set of options along with the specific option that we're checking, because we're using powers of two, we'll get a value different from zero if the option exists in the set (or zero if it doesn't):

(options & SwiftRocksOptionsDarkMode)     != 0 // false
(options & SwiftRocksOptionsNoJavaScript) != 0 // true
(options & SwiftRocksOptionsNoImages)     != 0 // true

Option bitmasking is a very clever use of the binary representation of a number. While unrelated to the article, I like to mention that doing option bitmasks like this is also used as the (current) most efficient solution to the famous Traveling Salesman problem.

How NS_OPTIONS is bridged to OptionSet

In Swift, you don't have to create enums and manually generate these masks. The OptionSet protocol abstracts the process of creating such bitmasks. Hooray!

But what about things that come from Objective-C? If the Clang importer section of the compiler detects a NS_OPTIONS macro, it will convert it to a struct that inherits from OptionSet. OptionSet itself inherits from RawRepresentable, so the enum elements are converted into static let properties that take a rawValue. Here's an example of a converted JSONSerialization.WritingOptions:

public struct WritingOptions: OptionSet {

    public typealias RawValue = Int

    public let rawValue: RawValue

    public init(rawValue: RawValue) {
        self.rawValue = rawValue
    }

    public static let prettyPrinted = WritingOptions(rawValue: 1 << 0)
    public static let sortedKeys = WritingOptions(rawValue: 1 << 1)
    public static let fragmentsAllowed = WritingOptions(rawValue: 1 << 2)
    public static let withoutEscapingSlashes = WritingOptions(rawValue: 1 << 3)
}

How OptionSet is defined in the Standard Library

Although the OptionSet is used with integers and bitmasking, it can, in theory, be used with anything else as the protocol is just a rawValue and some algebra methods:

public protocol OptionSet: RawRepresentable, SetAlgebra {
    associatedtype Element = Self // from SetAlgebra

    init(rawValue: Self.RawValue)

    public func union(_ other: Self) -> Self
    public func intersection(_ other: Self) -> Self
    public func symmetricDifference(_ other: Self) -> Self
    public func contains(_ member: Self) -> Bool
    public mutating func insert(_ newMember: Self.Element) -> (Bool, Self.Element)
    public mutating func remove(_ member: Self.Element) -> Self.Element?
    public mutating func update(with newMember: Self.Element) -> Self.Element?

However, implementing these methods isn't required -- if the rawValue type is a number that isn't a floating-point, you'll get default implementations in the form of the bitmasking techniques seen before!

public mutating func formUnion(_ other: Self) {
    self = Self(rawValue: self.rawValue | other.rawValue)
}

public mutating func formIntersection(_ other: Self) {
    self = Self(rawValue: self.rawValue & other.rawValue)
}

OptionSet Syntax Sugars

We've seen how to work with OptionSet through its methods, but where does this behavior come from?

options: [.prettyPrinted, .sortedKeys]

Even though the options argument is the struct itself, we're sending an array of them to the type -- and it works.

Interestingly enough this syntax sugar has nothing to do with OptionSet itself. IT comes from an inner protocol of which it inherits from: SetAlgebra.

The SetAlgebra protocol is the backbone for mathematical set operations in the Standard Library and is used by types like Set and OptionSet:

public protocol SetAlgebra: Equatable {
    associatedtype Element

    public func union(_ other: Self) -> Self
    public func intersection(_ other: Self) -> Self
    public func symmetricDifference(_ other: Self) -> Self
    public func contains(_ member: Self) -> Bool
    ...

(This looks very similar to OptionSet itself, and is because OptionSet can be seen as just a light abstraction of SetAlgebra that provides the bitmasking features through a rawValue.)

The magic comes from this little extension:

extension SetAlgebra: ExpressibleByArrayLiteral {
    public init(arrayLiteral: Element...) {
        self.init(arrayLiteral)
    }

    public init<S: Sequence>(_ sequence: S) where S.Element == Element {
        self.init()
        for e in sequence { insert(e) }
    }
}

SetAlgebra inherits from ExpressibleByArrayLiteral, which allow it to be represented directly as an array. This triggers an insert() call for every element, which for our OptionSet types means that we'll progressively trigger bitwise AND (|) operations, resulting in a final OptionSet that represents the combination of all the options. Hooray!

How ExpressibleByArrayLiteral works internally

The innards of the ExpressibleBy family are covered in detail in my previous article, but as a short explanation, we can say that when the type checker finds a literal being attributed to an explicit type, it checks if it conforms to the related ExpressibleBy protocol and coverts the expression to the designated initializer from the protocol if it does (and throws a compilation error if it does not). For more details, check the linked article.

Conclusion

The OptionSet protocol itself is hardly used nowadays, but its existence has some history and shows some of the importance of analyzing what you're developing in terms of performance. You can always use a regular Array or Set to define these options, but should you? For these case where you have a limited set of unique options, using a bitmask is considerably quicker and less resource-intensive, something which your user's device will be very grateful for :)

References and Good reads

The Swift Source Code