Memory Management and Performance of Value Types

Memory Management and Performance of Value Types

It's very likely that you asked yourself at least once in your iOS career what's the difference between a struct and a class. While realistically the choice between using one or another always boils down to value semantics versus reference semantics, the performance differences between the two are expressive and can heavily favor one or another depending on the contents of your object, specially when dealing with value types.

Some people might say that knowledge of memory architecture is irrelevant for application level developers, and I agree partially. Knowing how to save a few bits here and there will make no visible difference on newer iPhones, and premature optimization is a highly shunned practice.

However, both reference and value types can severely slow down your app when misused, and such knowledge will define whether or not you can fix the problem efficiently.

To understand the deeper differences between the two, let's look back at the address space of a process: (single thread for simplicity)

|- - - - - - - - - - - - - - - - - - - - - - - - - - |
|                    Instructions                    |
|- - - - - - - - - - - - - - - - - - - - - - - - - - -
|                    Global Data                     |
|- - - - - - - - - - - - - - - - - - - - - - - - - - -
|                        Heap                        |
|- - - - - - - - - - - - - - - - - - - - - - - - - - -
|      Nothing (Stack and heap grow towards here)    |
|- - - - - - - - - - - - - - - - - - - - - - - - - - -
|                        Stack                       |
|- - - - - - - - - - - - - - - - - - - - - - - - - - -

Stack Allocation

In memory architecture, the stack is no different from the data structure you already know, and Stack Allocation is a simple and fast way to allocate/deallocate memory that involves the stack.

Each "scope" in your app (like the inner contents of a method) will provide the amount of memory it needs to run, move the stack pointer in memory by such amount and run - adding data to the empty memory addresses it now posesses. After the scope is lost, the stack pointer will simply decrease by the same amount - safely deallocating all the scope's data. The cost of allocating/deallocating stack memory is literally the cost of assigning an integer.

Stack Allocated Value Types

In Stack Allocation, the data gathered by the scope means everything attributed to it - like method arguments, return values, but more importantly: value types. As long as the size of the value type is known during compile time and it doesn't contain / is not contained, recursively, by a reference type, it will not need reference counting, and its life will be static - equal to the life of its scope. It will be allocated completely on the stack, and when the scope is deallocated, so is the value type. The absence of reference counting overhead and the presence of stack allocation is a considerable performance boost.

PS: All benchmarks done with -O. I had to add some special logic and keywords/attributes to prevent the compiler from skipping my methods, but I've hidden them in the examples to make the code easy to read.

struct EmptyStruct {
	private let number: Int64 = 1
	//Empty classes have 64bits of storage by default for the pointer
	//so we're adding 64bits to our struct to make fair comparisons.
}

@inline(never) func createABunchOfEmptyStructs() {
    for _ in 0..<1_000_000 {
        let myStruct = EmptyStruct()
    }
}

createABunchOfEmptyStructs()
//Moving the stack pointer up by the size of a million EmptyStructs.
//Adding a million EmptyStructs to the empty addresses created by moving the stack pointer.
//Moving the stack pointer down by the same amount.
//Total: ~0.005 seconds

If the contents of your value type are other stack allocated, static size value types, then your value type also is also static sized. That means your value type will also leverage full stack allocation as well as having a performance boost in copying operations.

We once asked a candidate why he chose to use a class for something that was clearly immutable and meant to be treated with value semantics. His reasoning was that the object was being constantly sent as a parameter inside methods, and so he was concerned about the potential performance implications of copying that several times.

Assigning a property to most value types will indeed create a full copy of the object. However, this copy-on-assignment behaviour for fully stack allocated value types is so fast and so cheap that Apple claims it runs in constant time:

struct BigStaticStruct {
    let fp1: Int64 = 1
    let fp2: Int64 = 1
    let fp3: Int64 = 1
    let fp4: Int64 = 1
    let fp5: Int64 = 1
}

func createABunchOfCopiesOfHugeStruct() {
    let bigStruct = BigStaticStruct()
    for _ in 0..<1_000_000 {
        let copy = bigStruct
    }
}

createABunchOfCopiesOfHugeStruct() // ~0.0033 secs

//Even if you increase the number of properties tenfold the runtime is unchanged
//because copying a static sized struct is a constant time operation.

Stack Allocation can, however, devour your app's memory if you're dealing with many depths of recursion. Thankfully, Swift has tail recursion optimization, meaning that if you disassemble a method with tail recursion you'll find an iterative version of your algorithm instead.

Heap Allocation

But what happens when you need to introduce objects with extendable sizes and "shudders" the concept of pointers?

The stack is not meant to be used with objects that change in size, and the concept of pointers / dynamic lifetime means that an object's life could have nothing to do with its scope - after all, it's possible to have an object existing in memory even if there's nothing going on.

The heap, like the stack, is not much different from the data structure with the same name, and in this case, it's meant to be used for dynamically-allocated, user-managed memory.

When the process requests a certain amount of memory, the heap will search for a memory address that fulfills this request and return it to the process. When the memory is not being used anymore, the process must tell the heap to free that section of memory.

In iOS, "not being used anymore" works in the shape of reference counting, and luckily the existence of ARC means that most things are automatically handled for you unless you have to deal with the RawPointer family.

Heap Allocation is slower than Stack Allocation not just because of the more complex data structure - it also requires thread safety. Each thread has its own stack, but the heap is shared with everybody, demanding synchronization. However, it allows reference types and things like dynamic size arrays to exist.

final class EmptyClass {}

@inline(never) func createABunchOfEmptyClasses() {
    for _ in 0..<1_000_000 {
        let myClass = EmptyClass()
    }
}

createABunchOfEmptyClasses()

//Moving the stack pointer up by the size of a million EmptyClass pointers.
//Requesting memory in the heap for a million EmptyClasses.
//Adding a million EmptyClass pointers to the empty addresses created by moving the stack pointer, pointing to the heap's returned addresses.
//(Loop ends) Decrementing the reference count of the pointers.
//The reference count of each class drops to zero, and a request to free their memory addresses is sent.
//Moving the stack pointer down.
//Total: ~0.117 secs

It would be nice if memory management was as binary as saying that value types go to the stack and reference types go to the heap, but in reality, the life and performance of a value type is heavily defined by its contents.

Heap Allocated Value Types

If the size of your value type cannot be determined during compile time (because of a protocol/generic requirement), or if your value type recursively contains / is contained by a reference type (remember that closures are also reference types), then it will require heap allocation. This can range from being not being an issue at all to making your struct perform exponentially worse than it would if it was a class instead.

Stack allocated value types are great because their life are directly related to their scope's life, but if your value type is the child of a class, a reference is all it takes for it to outlive its scope. This situation is common in @escaping closures, and this value type will lose its stack allocation properties in order to be fully heap allocated alongside the reference type. In a way, you could even say that this kind of value type is a reference type itself, as living in the heap means that several objects can point to it - even though it still possesses value semantics.

If your value type is instead the parent of a heap allocated class, then it will not be heap allocated itself, but it will inherit reference counting overhead in order to be able to keep it's inner references alive. This can cause a considerable drop in performance depending on the complexity of the value type.

In the Standard Library, examples of value types with child references are String, Array, Dictionary and Set. These value types contain internal reference types that manage the storage of elements in the heap, allowing them to increase/decrease in size as needed.

Since heap operations are more expensive than stack ones, copying heap allocated value types is not a constant operation like in stack allocated ones. To prevent this from hurting performance, the Standard Library's extensible data structures are copy-on-write.

With this capability, merely assigning properties will not copy the value type - instead, it will create a reference just like if it was a regular reference type. The actual copying will only happen when it's really necessary.

//Copy on Assignment
let emptyStruct = EmptyStruct() //address A
let copy = emptyStruct //address B

//Copy on Write
let array = [1,2,3] //address C
var notACopy = array //still address C
notACopy = [4,5,6] //now address D

Be aware that any value type you create will be copy-on-assignment, but you can code them to have copy-on-write capabilities. This is not a compiler thing, The Standard Library itself does it on the code level and so can you. Here's an example from Apple.

Problematic Reference Counting in Value Types With Inner References

A fully stack allocated value type will not need reference counting, but a value type with inner references will unfortunately inherit this ability.

Consider two objects: a struct full of classes and a class full of the same classes:

struct HugeDynamicStruct {
    var emptyClass = EmptyClass()
    var emptyClass2 = EmptyClass()
    var emptyClass3 = EmptyClass()
    var emptyClass4 = EmptyClass()
    var emptyClass5 = EmptyClass()
    var emptyClass6 = EmptyClass()
    var emptyClass7 = EmptyClass()
    var emptyClass8 = EmptyClass()
    var emptyClass9 = EmptyClass()
    var emptyClass10 = EmptyClass()
}

class HugeClass {
    var emptyClass = EmptyClass()
    var emptyClass2 = EmptyClass()
    var emptyClass3 = EmptyClass()
    var emptyClass4 = EmptyClass()
    var emptyClass5 = EmptyClass()
    var emptyClass6 = EmptyClass()
    var emptyClass7 = EmptyClass()
    var emptyClass8 = EmptyClass()
    var emptyClass9 = EmptyClass()
    var emptyClass10 = EmptyClass()
}

The following snippet will check how long it takes to create a HugeClass, reference it ten million times, add all these references to an array and deallocate everything. Then it will do the same to a struct variant.

func createABunchOfReferencesOfClass() {
    var array = [HugeClass]()
    let object = HugeClass()
    for _ in 0..<10_000_000 {
        array.append(object)
    }
}

func createABunchOfCopiesOfStruct() {
    var array = [HugeDynamicStruct]()
    let object = HugeDynamicStruct()
    for _ in 0..<10_000_000 {
        array.append(object)
    }
}

//Each object contains ten EmptyClasses

createABunchOfReferencesOfClass() // ~1.71 seconds
createABunchOfCopiesOfStruct() // ~5.1 seconds

Judging from what was previously said it's somewhat expected that the struct version would take longer as it is copy-on-assignment, compared to the class version which is only increasing a reference count value.

However, consider what happens when we increase the amount of EmptyClasses inside of each object:

//Each object now contains TWENTY EmptyClasses

createABunchOfReferencesOfClass() // ~1.75 seconds
createABunchOfCopiesOfStruct() // ~14.5 seconds

Adding more classes to HugeClass did nothing to the runtime of the algorithm, but HugeDynamicStruct's version took more than twice as much to run!

Since all reference types require reference counting, increasing the amount of properties of a class of classes will not change the runtime of this algorithm, as merely increasing the reference count of the parent reference will be enough to keep it's inner references alive.

However, value types do not naturally have a reference count. If your value type contains inner references, copying it will require increasing the reference count of it's children instead - not the first, not the second, but literally every single one of them.

final class ClassOfClasses {
    let emptyClass = EmptyClass()
    let emptyClass2 = EmptyClass()
    let emptyClass3 = EmptyClass()
}

let classOfClasses = ClassOfClasses()
let reference = classOfClasses
let reference2 = classOfClasses
let reference3 = classOfClasses

CFGetRetainCount(classOfClasses) // 4
CFGetRetainCount(classOfClasses.emptyClass) // 1
CFGetRetainCount(classOfClasses.emptyClass2) // 1
CFGetRetainCount(classOfClasses.emptyClass3) // 1

struct StructOfClasses {
    let emptyClass = EmptyClass()
    let emptyClass2 = EmptyClass()
    let emptyClass3 = EmptyClass()
}

let structOfClasses = StructOfClasses()
let copy = structOfClasses
let copy2 = structOfClasses
let copy3 = structOfClasses

CFGetRetainCount(structOfClasses) // Doesn't compile, structs themselves don't have a reference count.
CFGetRetainCount(structOfClasses.emptyClass) // 4
CFGetRetainCount(structOfClasses.emptyClass2) // 4
CFGetRetainCount(structOfClasses.emptyClass3) // 4

The more reference types you have inside of a value type, the more reference counting overhead you are going to have when copying it, leading to potentially nasty performance issues.

Avoiding Reference Counting Overhead in Value Types

You can improve your app's performance by swapping unnecessary references with proper static size value types.

Consider the following value type with inner references:

struct DeliveryAddress {
    let identifier: String
    let type: String
}

If identifier represents an UUID, it can be safely replaced by Foundation's UUID struct, which is statically sized.

In a similar fashion, type could easily be a pre-defined enum instead.

struct DeliveryAddress {
    enum AddressType {
        case home
        case work
    }
    let identifier: UUID
    let type: AddressType
}

With these changes, this struct is now statically sized. Not only reference counting overhead was eliminated, but it is also a lot more typesafe now.

If your value type is more complicated than this (and you have a performance issue), ask yourself if it really shouldn't be a class with copy-on-write capabilities instead.

From the Apple Docs:

As a general guideline, consider creating a structure when one or more of these conditions apply:

The structure’s primary purpose is to encapsulate a few relatively simple data values.

It is reasonable to expect that the encapsulated values will be copied rather than referenced when you assign or pass around an instance of that structure.

Any properties stored by the structure are themselves value types, which would also be expected to be copied rather than referenced.

The structure does not need to inherit properties or behavior from another existing type.

Examples of good candidates for structures include:

The size of a geometric shape, perhaps encapsulating a width property and a height property, both of type Double.

A way to refer to ranges within a series, perhaps encapsulating a start property and a length property, both of type Int.

A point in a 3D coordinate system, perhaps encapsulating x, y and z properties, each of type Double.

In all other cases, define a class, and create instances of that class to be managed and passed by reference. In practice, this means that most custom data constructs should be classes, not structures.

What else?

Even though the examples shown here are extremely exaggerated, small mistakes can and will add up quick enough to cause you trouble in the future. Remember: People want to have a good time with apps, and most of them won't accept anything less than a silky smooth 60 fps experience. Waits/freezes are so universally hated that 53% of visits are abandoned if a mobile site takes longer than 3 seconds to load, and you should keep that in mind when your app starts displaying random hiccups/slowdowns, specially when scrolling content.

Performance depends on several factors, and choosing between structs and classes are just one of them. If you are interested in this topic, I largely recommend watching the WWDC videos about Method Dispatching and Witness Tables.

Follow me on my Twitter - @rockbruno_, and let me know of any suggestions and corrections you want to share.

References and Good reads

Operating Systems: Three Easy Pieces
WWDC: Understanding Swift Performance
WWDC: Optimizing Swift Performance
WWDC: Building Better Apps with Value Types in Swift
Apple: Optimization Tips