Swift's Sequence Inside The Compiler: How for loops work internally

Swift's Sequence Inside The Compiler: How for loops work internally

One of my favorite things about Swift is that almost every compiler feature is built on top of actual classes and protocols of the language. This means that if you see a native type that has some special magical property (like SwiftUI's declarative syntax, which was covered in another post here in SwiftRocks), it's likely that you can reproduce that behavior in your custom types.

In this article, I'll cover one of my favorite syntax sugars in the language: the protocols and the internal compiler behavior that allow you to write for loops. You might have heard already of the Sequence family of protocols, and here, we'll see how the compiler uses them as the building blocks for for loops.

for i in 0..<10 {
    print("Wait, this is not a hardcoded language thing?")
}

In most languages, the ability to iterate simple data types like integers in for statements is often hardcoded into the compiler. This was no different in the first few versions of Swift where the standard was to use old C-style loops:

// Swift 1.2
for var i = 0; i < 10; i++ {
    print(i)
}

for in style loops came along with the Range type, and the release of Swift 3 officially removed the old C-style loops from the language and forced everyone to use the for in loops that we are all used to. Now, for loops in Swift are based on actual protocols that you can inherit to add for in capabilities to anything you create that can be represented as such. As expected, this means that the ability to use ranges as the parameter of a loop isn't because ranges are treated differently in the compiler -- but because ranges conform to specific public protocols.

IteratorProtocol

The base of how iteration in general works in Swift is the IteratorProtocol type -- a very simple protocol that only takes two components: An associatedtype that represents what's being iterated, and a next property that returns the "next" element on the iteration (or nil if it's over).

A basic example of something that can be implemented as an IteratorProtocol is a countdown timer that keeps reducing a value until it reaches zero:

struct CountdownIterator: IteratorProtocol {
    typealias Element = Int

    var count: Int

    mutating func next() -> Int? {
        if count == 0 {
            return nil
        } else {
            self.count -= 1
            return count
        }
    }

    init(count: Int) {
        self.count = count
    }
}

If we access this iterator continuously we'll get diminishing values until the iteration ends, represented by nil:

var iterator = CountdownIterator(count: 3)
iterator.next() // 2
iterator.next() // 1
iterator.next() // 0
iterator.next() // nil
iterator.next() // nil

Iterators are said to be destructive because you can't access a specific value more than once -- you need to go from the beginning to the end in a linear fashion and you can only do it. If you need to access a value again, you need to create another iterator and repeat the process.

Sequence

In terms of iteration, the Sequence protocol is what we interact with the most. In short, a Sequence is simply a more high level abstraction of a IteratorProtocol type that provides additional functionality on top of the regular iteration process. While Swift allows you to create Sequences that are also IteratorProtocols, the existence of this protocol allows you to separate the definition of the type itself from the logic that handles its iteration. To use it, all you have to do is create a makeIterator() type that returns an IteratorProtocol:

struct Countdown: Sequence {
    typealias Iterator = CountdownIterator

    let count: Int

    init(count: Int) {
        self.count = count
    }

    func makeIterator() -> CountdownIterator {
        return CountdownIterator(count: count)
    }
}

The implementation requirements of the Sequence protocol might make it look useless, but don't judge a book by its cover -- it actually contains tons of required methods, but they are all implemented for you in the standard library. Sequence contains pre-definitions for lots of the commonly used sequence-based algorithms, including map, filter and reduce!

Even though IteratorProtocol is the type that defines how the iteration happens, Sequence is the type that provides the additional properties and methods that make these iterations useful in the first place:

let countdown = Countdown(count: 3)
countdown.map { $0 * $0 }
countdown.allSatisfy { $0.isMultiple(of: 2) }
countdown.max()
countdown.sorted()
countdown.forEach { print($0) }

Just like in IteratorProtocol, we can create an iterator and linearly retrieve values from it. This is how all these properties and methods work -- for example, this is how we could implement a clone of map():

extension Sequence {
    func map<T>(_ transform: (Iterator.Element) throws -> T) rethrows -> [T] {
        var iterator = makeIterator()
        var result = [T]()
        while let next = iterator.next() {
            result.append(try transform(next))
        }
        return result
    }
}

But this isn't all -- the greatest property of Sequences and the reason this article was conceived is that Sequences are deeply tied to the Swift compiler in the form of for loops. Like mentioned before, the reason you can use ranges in loops is not because ranges are special, but simply because ranges are Sequences. Because our example Countdown type is a Sequence, it can also be used inside a for loop!

for duration in Countdown(count: 3) {
    print(duration)
}

As one would expect, this will work with any type that conforms to Sequence.

How for loops work internally in Swift

Interestingly, this behavior isn't new to iOS -- old-timers may remember that you could unlock for-each style loops in Objective-C in a similar fashion by making objects inherit from NSFastEnumerator:

for (id object in objectThatConformsToFastEnumerator) {
    NSLog(@"%@", object);
}

But unlike back then, with Swift we can actually see what's happening under the hood. for exists inside the compiler for the purpose of getting your code to be parsed correctly, but the final code is actually different. To inspect this, what I like to do is to compile an example file and have Swift dump its final Swift Intermediate Language representation:

swiftc -emit-sil forLoopTest.swift

If I run this command with an example forLoopTest.swift file that contains an for loop for an array such as the one below, we'll see things like this around the area where the loop was supposed to be defined:

let array = [1,2,3]
for num in array {
    print(num)
}
// function_ref Collection<>.makeIterator()
%40 = function_ref ...
// function_ref IndexingIterator.next()
%40 = function_ref ...
switch_enum %43 : $Optional<Int>, case #some!enumelt.1: bb3, case #none!enumelt: bb2

This shows us that for loops don't really exist -- they are simply a form of syntax sugar to allows you to effortlessly iterate an IteratorProtocol without having to actually extract the iterator!

We can get more details by seeing Swift's source code, more specifically the classes that handle the semantic analysis of the code. In this case, the magic happens after the compiler type-checks the loop: If we analyze the code from the type-checker, we can see that if the loop represents valid code that iterates a Sequence, it injects a new property called ${loopName}$generator that fetches the sequence's iterator, then fetches the iterator's Element type and then finally moves our original closure into a new loop that loops the iterator's next() property.

For example, if we compile the previous loop example, the final result will be something comparable (but not exactly) to this:

let array = [1,2,3]
var $num$generator = array.makeIterator()
while let num = $num$generator.next() { 
    print(num)
}

The Swift compiler uses the $ prefix for internal compiler synthesized properties. Because it needs to inject actual code, it uses a symbol that is impossible to be used in regular code to make sure we won't have namespacing issues. Interestingly, you can actually interact with some of these internal properties as part of the new property wrappers feature.

What about Collections?

When studying more about Swift's collection types, IteratorProtocol, Sequence and Collection are often mentioned together, and you might have noticed that the generated SIL for our example mentions Collections as it uses an array. If the syntax sugar is linked to Sequences only, how do collections work?

We can say that a Collection is simply a more jacked-up version of a Sequence. In fact, all Collections are also Sequences:

public protocol Collection: Sequence { ... }

Unlike regular Sequences, Collections allow you to access elements multiple times, iterate them in reverse order (in the case of BidirectionalCollections) and even access specific elements directly through subscripts (in the case of RandomAccessCollections like Array<>)

Unlike Sequences, Collections do not need to provide an IteratorProtocol. By default, Swift provides the IndexingIterator struct: an IteratorProtocol that takes a collection and simply traverses the indexes that are defined as part of the Collection protocol. As all Collections are Sequences, they can also benefit from having syntax sugar loops.

---

References and Good reads

The Swift Source Code