Lazy Sequences in Swift And How They Work
Usage of high-order functions like map
and filter
are very common in Swift projects, as they are simple algorithms that allow you to convert extensive ideas into simple one-liners. Unfortunately, they don't solve every issue - at least not in their default implementations. High-order functions are eager: they use the closure immediately and return a new array, regardless if you need an early return or only going to use specific elements. When performance is important, you might be cornered into writing specialized helper methods to avoid the eager nature of high-orders:
let addresses = getFirstThreeAddresses(withIdentifier: "HOME")
func getFirstThreeAddresses(withIdentifier identifier: String) -> [Address] {
//Not using .filter{}.prefix(3) because we need an early return
var addresses = [Address]()
for address in allAddresses where address.identifier == identifier {
addresses.append(address)
if addresses.count == 3 {
break
}
}
return addresses
}
Fortunately, Swift has a way to use high-order functions while still keeping the improved performance of the helper methods - Lazy versions of the Swift Standard Library Sequences
and Collections
can be accessed through the lazy
property.
These lazy variations work just like their regular counterparts, but with one twist: they have custom implementations of methods like map
and filter
in order to make them work lazily - meaning that the actual computations will only happen where and when you need them.
let allNumbers = Array(1...1000)
let normalMap = allNumbers.map { $0 * 2 } // The entire Sequence will be mapped, regardless of what you need to do.
let lazyMap = allNumbers.lazy.map { $0 * 2 } // Nothing happens here.
print(lazyMap[0]) // Prints 2, but everything else is still untouched!
While somewhat scary at first, they allow you to reduce most for
loops with early returns into one-liners. For example, here's how it compares to other methods when used to find the first element that fulfills a predicate:
// allAddresses in an [Address] with 10000 elements, and a "HOME" address is placed at the beginning
let address = allAddresses.filter { $0.identifier == "HOME" }.first // ~0.15 seconds
// Versus
func firstAddress(withIdentifier identifier: String) -> Address? {
// Nowadays you can use the Standard Library's first(where:) method,
// but lets pretend that doesn't exist.
for address in allAddresses where address.identifier == identifier {
return address
}
return nil
}
let address = firstAddress(withIdentifier: "HOME") // Instant
// Versus
let address = allAddresses.lazy.filter { $0.identifier == "HOME" }.first // Also instant with much less code!
Besides writing shorter code, they can also be very useful for delaying operations in general to make your code easier to read. Let's say you have a shopping app which displays offer incentives from a local database if the user's taking too long to finish a purchase:
let offerViews = offersJson.compactMap { database.load(offer: $0) }.map(OfferView.init) // O(n)
var currentOffer = -1
func displayNextOffer() {
guard currentOffer + 1 < offerViews.count else {
return
}
currentOffer += 1
offerViews[currentOffer].display(atViewController: self)
}
While this solution works, it has a major flaw: I am eagerly mapping the entire offer json into OfferViews
, even though there's no guarantee that the user will see any of these offers. This isn't really an issue if offerJson
is a small array, but with large data sets, pulling all the offers from a database can quickly become a problem.
You can map only the necessary OfferViews
by moving the parsing logic to displayNextOffer()
, but your code quality might become harder to understand since you now have to keep the raw offer data around:
let offersJson: [[String: Any]] = //
var currentOffer = -1
func displayNextOffer() {
guard currentOffer + 1 < offerViews.count else {
return
}
currentOffer += 1
guard let offer = database.load(offer: offersJson[currentOffer]) else {
return
}
let offerView = OfferView(offer: offer)
offerView.display(atViewController: self)
}
By using lazy
, the current offerView
will only be mapped when the array position is accessed in displayNextOffer()
, keeping the reading quality of the first implementation with the performance of the second one!
let offerViews = offersJson.lazy.compactMap { database.load(offer: $0) }.map(OfferView.init) // Nothing happens here!
var currentOffer = -1
func displayNextOffer() {
guard currentOffer + 1 < offerViews.count else {
return
}
currentOffer += 1
offerViews[currentOffer].display(atViewController: self) // Mapping only happens here, for the desired element only.
}
Note, however, that Lazy Sequences have no caching. This means that if offerViews[0]
is accessed twice, the entire mapping process will also happen twice. If you need to access elements more than once, move them to a regular array.
Why this works?
While they look like magic when used, the internal implementation of Lazy Sequences aren't as complicated as it looks.
If we print the type of our second example, we can see that even though our lazily mapped Collection
works like a regular Collection
, we are dealing with different types:
let lazyMap = Array(1...1000).lazy.map { $0 * 2 }
print(lazyMap) // LazyMapCollection<Array<Int>, Int>
let lazyMap = Array(1...1000).lazy.filter { $0 % 2 == 0 }.map { $0 * 2 }
print(lazyMap) // LazyMapCollection<LazyFilterCollection<Array<Int>>, Int>
//In this case, the first generic argument is the inner Collection of the lazy operation, while the second one is the transformation function of the map operation.
Looking at Swift's source code, we can see that the non-eagerness comes from the fact that these methods don't actually do anything besides return a new type:
(I'll be using LazySequence
code examples instead of LazyCollections
ones because they are much simpler in nature. If you don't know how regular Sequences
work, take a look at this Apple page.)
extension LazySequenceProtocol {
/// Returns a `LazyMapSequence` over this `Sequence`. The elements of
/// the result are computed lazily, each time they are read, by
/// calling `transform` function on a base element.
@inlinable
public func map<U>(_ transform: @escaping (Elements.Element) -> U) -> LazyMapSequence<Self.Elements, U> {
return LazyMapSequence(_base: self.elements, transform: transform)
}
}
The magic comes from the internal implementation of these specialized types. If we take a look at LazyMapSequence
and LazyFilterSequence
, for example, we can see that they are nothing more than regular Sequences
that stores an operation and applies their counterpart eager functions only when iterated:
// _base is the original Sequence
extension LazyMapSequence.Iterator: IteratorProtocol, Sequence {
@inlinable
public mutating func next() -> Element? {
return _base.next().map(_transform)
}
}
extension LazyFilterSequence.Iterator: IteratorProtocol, Sequence {
@inlinable
public mutating func next() -> Element? {
while let n = _base.next() {
if _predicate(n) {
return n
}
}
return nil
}
}
LazyCollection Performance Traps
If would be nice if the article ended here, but it's important to know that Lazy Sequences have flaws - specifically when the underlying type is a Collection
.
In the opening example, our method gets the first three addresses that fulfill a certain predicate. By chaining lazy operations together, this can also be reduced to an one-liner:
let homeAddresses = allAddresses.lazy.filter { $0.identifier == "HOME" }.prefix(3)
However, look how this specific example performs when compared to the eager counterpart:
allAddresses.filter { $0.identifier == "HOME" }.prefix(3) // ~0.11 secs
Array(allAddresses.lazy.filter { $0.identifier == "HOME" }.prefix(3)) // ~0.22 secs
Even though the lazy
version stops as soon as the three addresses are found, it performs twice as bad as the eager one!
The unfortunate reason comes from the subtle differences between Sequences
and Collections
. While prefixing a Sequence
is as simple as moving the desired elements to a separate Array
, slicing operations on Collections
require knowing the end
index of the desired slice:
public func prefix(_ maxLength: Int) -> SubSequence {
_precondition(maxLength >= 0, "Can't take a prefix of negative length from a collection")
let end = index(startIndex, offsetBy: maxLength, limitedBy: endIndex) ?? endIndex
return self[startIndex..<end]
}
@inlinable
public subscript(bounds: Range<Index>) -> Slice<Self> {
_failEarlyRangeCheck(bounds, bounds: startIndex..<endIndex)
return Slice(base: self, bounds: bounds)
}
The problem is that in Collection
lingo, an endIndex
isn't the index of the last element, but the index after the last element (index(startIndex, offsetBy: maxLength)
). For our lazy filter
, this means that in order to slice the first three home addresses, we must find four of them - which may not even exist.
The documentation of certain lazy types states this issue:
/// - Note: The performance of accessing `endIndex`, `last`, any methods that
/// depend on `endIndex`, or moving an index depends on how many elements
/// satisfy the predicate at the start of the collection, and may not offer
/// the usual performance given by the `Collection` protocol. Be aware,
/// therefore, that general operations on `${Self}` instances may not have
/// the documented complexity.
public struct LazyPrefixWhileCollection<Base: Collection> {
To make it worse, since a Slice
is a mere window to the original Collection
, the casting to Array
will invoke functions that call the lazy filtered Collection
's count
properties - but since the lazy.filter(_:)
operation doesn't conform to RandomAccessCollection
, count
can only be found by iterating the entire Collection
- again.
Due to the Lazy Sequence's lack of caching, this results in the entire filtering/slicing process happening again. Thus, if the fourth element doesn't exist or is too far from the third one, the lazy
version will perform twice as worse as the original one.
The good news is that this can be avoided - if you're not sure your lazy operation will run in reasonable time, you can guarantee it by treating the result as a Sequence
. This has the downside of losing the reverse-iteration capabilities of a BidirectionalCollection
, but guarantees that forward operations will be fast again.
let sequence: AnySequence = allAddresses.lazy.filter { $0.identifier == "HOME" }.prefix(3)
let result = Array(sequence) // ~0.004 secs!
Conclusion
Usage of lazy
objects can allow you to write high-performance complicated things very quickly - at the cost of requiring some understanding of Swift internals to prevent major issues. Like all features, they have great advantages with equal downsides, and in this case knowledge of the main differences between Sequences
and Collections
is required to extract the best of them. Once mastered, mapping specific elements becomes very simple and intuitive.
Follow me on my Twitter - @rockbruno_, and let me know of any suggestions and corrections you want to share.
References and Good reads
Filter.swiftSR-4164
LazyPrefixWhileCollection
LazySequenceProtocol
Sequence