Storage Patterns: Set
In this post, I’ll build a minimal enumerable set in Solidity. A set stores unordered values without repetition.
Set Functionality
The implementation in this post will support the following primitive set operations:
- Add a value to the set.
- Remove a value from the set.
- Test whether the set contains a given value.
- Enumerate all values in the set.
Many implementations additionally implement set-theoretical operations:
- Find the union of two sets.
- Find the intersection of two sets.
- Find the difference between two sets.
I won’t write those higher level functions here, but note that they can be easily implemented using the primitive operations.
Combining Data Structures
A mapping from value types to booleans provides much of a set’s functionality. It efficiently supports adding and removing items as well as testing for membership. What it doesn’t support is enumeration.
An array works well for enumeration, but it doesn’t support efficiently finding an existing item.
To support all the primitive operations efficiently, I will combine the two data structures. In this example code, I’ve chosen bytes32
as the data type being stored in the set, but the pattern generalizes to any data type:
contract Set {
bytes32[] public items;
// 1-based indexing into the array. 0 represents non-existence.
mapping(bytes32 => uint256) indexOf;
The items
array is where the values are stored, in no particular order. This allows for efficient enumeration.
The indexOf
mapping keeps track of which values are contained within the set and where they can be found in the items
array.
In Solidity, mapping
s do not have a notion of existence. Every key maps to something, and the default value is 0
. This is a bit of a problem, because 0
is also the first index into the items
array. To resolve this ambiguity, indexOf
will hold 1-based indexes:
- If
indexOf[value] == 0
, then the value does not exist in the set. - If
indexOf[value] == n
, wheren > 0
, thenitems[n - 1] == value
.
Adding Items
To add an item to the set, it must be appended to the items
array, and it’s new (1-based) index must be stored in indexOf
:
function add(bytes32 value) public {
if (indexOf[value] == 0) {
items.push(value);
indexOf[value] = items.length;
}
}
Recall that sets do not allow repeated elements. The if
statement ensures that an item is not added multiple times.
Testing for Membership
The indexOf
mapping makes it easy to efficiently test for set membership:
function contains(bytes32 value) public view returns (bool) {
return indexOf[value] > 0;
}
Removing Items
Removing an item involves updating both the indexOf
mapping and the items
array. Updating the mapping is straightforward, but it’s less obvious how an item can be efficiently deleted from the middle of an array.
The key to implementing this operation efficiently is to recall that items in a set are unordered. Because we don’t need to preserve order, we can shuffle items around to make our lives easier. To remove the item from the array, we first swap it to the end of the array and then shrink the array:
function remove(bytes32 value) public {
uint256 index = indexOf[value];
require(index > 0);
// move the last item into the index being vacated
bytes32 lastValue = items[items.length - 1];
items[index - 1] = lastValue; // adjust for 1-based indexing
indexOf[lastValue] = index;
items.length -= 1;
indexOf[value] = 0;
}
Enumeration
Because it’s a public
state variable, the Solidity compiler generates a getter for the items
array. A client can call items(n)
to retrieve the nth item from the array.
To enumerate the entire list, the client also needs to know how many items there are. The count()
function provides this:
function count() public view returns (uint256) {
return items.length;
}
Summary
- A set stores values, unordered and without repetition.
- An enumerable set can be efficiently implemented with a mapping and an array.