Expand description
Making Arc
itself atomic
This library provides the ArcSwapAny
type (which you probably don’t want to use
directly) and several type aliases that set it up for common use cases:
ArcSwap
, which operates onArc<T>
.ArcSwapOption
, which operates onOption<Arc<T>>
.IndependentArcSwap
, which uses slightly different trade-of decisions ‒ see below.
Note that as these are type aliases, the useful methods are defined on
ArcSwapAny
directly and you want to look at its documentation, not on the
aliases.
This is similar to RwLock<Arc<T>>
, but it is faster, the readers are never blocked
(not even by writes) and it is more configurable.
Or, you can look at it this way. There’s Arc<T>
‒ it knows when it stops being used and
therefore can clean up memory. But once there’s a Arc<T>
somewhere, shared between
threads, it has to keep pointing to the same thing. On the other hand, there’s
AtomicPtr
which can be changed even when shared between
threads, but it doesn’t know when the data pointed to is no longer in use so it
doesn’t clean up. This is a hybrid between the two.
Motivation
First, the C++ shared_ptr
can act this way. The fact that it’s only the surface
API and all the implementations I could find hide a mutex inside wasn’t known to me when I
started working on this. So I decided Rust needs to keep up there.
Second, I like hard problems and this seems like an apt supply of them.
And third, I actually have few use cases for something like this.
Performance characteristics
It is optimised for read-heavy situations with only occasional writes. Few examples might be:
- Global configuration data structure, which is updated once in a blue moon when an operator manually does some changes, but looked into through the whole program all the time. Looking into it should be cheap and multiple threads should be able to look into it at the same time.
- Some in-memory database or maybe routing tables, where lookup latency matters. Updating the routing tables isn’t an excuse to stop processing packets even for a short while.
Lock-free readers
All the read operations are always lock-free. Most of the time, they are actually
wait-free. The only one that is only lock-free is the first lease
access in each
thread (across all the pointers).
So, when the documentation talks about contention, it talks about multiple cores having to sort out who changes the bytes in a cache line first and who is next. This slows things down, but it still rolls forward and stop for no one, not like with the mutex-style contention when one holds the lock and others wait outside.
Unfortunately, there are cases where readers block writers from completion. It’s much more
limited in scope than with Mutex
or RwLock
and steady stream of readers
will not prevent an update from happening indefinitely (only a reader stuck in a critical
section could, and when used according to recommendations, the critical sections contain no
loops and are only several instructions short).
Speeds
The base line speed of read operations is similar to using an uncontended Mutex
.
However, lease
and peek
suffer no contention from any other read
operations and only slight ones during updates. The load
operation is additionally
contended only on the reference count of the Arc
inside ‒ so, in general, while
Mutex
rapidly loses its performance when being in active use by multiple threads at
once and RwLock
is slow to start with, ArcSwapAny
mostly keeps its
performance even when read by many threads in parallel.
Write operations are considered expensive. A write operation is more expensive than access to
an uncontended Mutex
and on some architectures even slower than uncontended
RwLock
. However, it is faster than either when contended.
There are some (very unscientific) benchmarks within the source code of the library.
The exact numbers are highly dependant on the machine used (both absolute numbers and relative between different data structures). Not only architectures have a huge impact (eg. x86 vs ARM), but even AMD vs. Intel or two different Intel processors. Therefore, if what matters is more the speed than the wait-free guarantees, you’re advised to do your own measurements.
Choosing the right reading operation
Performance is world of trade-offs. Therefore, the library offers several very similar methods
to read the pointer. The default choice should nevertheless probably be lease
.
Only one of them is functionally different ‒
peek_signal_safe
. See below for signals, but in
general, it is the only thing you want to use inside a signal handler and you don’t want to use
it anywhere else.
load
creates a full blownArc
. It’s the most heavy-weight around and while wait-free, it suffers from contention on the reference count in theArc
, so when used from too many threads at once, it’ll become slow. On the other hand, there’s no restriction on how long you can hold onto the result or how many of them you keep around, so this is appropriate if creating handles for long-term storage. It also provides a bridge to other algorithms which only take theArc
.lease
returns a proxy object that works as a pointer to the stored data and can be upgraded to fullArc
later on if needed. There’s no limit on how long it can live around. However, it internally comes at two flavors, one cheap and one containing a fullArc
in it. Each thread is entitled to only limited total number of cheap ones at a given time (currently 8) and if more are constructed, the others fall back on the full version (which then usesload
internally). Therefore,lease
can be fast (almost as fast aspeek
) but only as long as the thread calling it doesn’t have too many leases around at the time.peek
is the cheapest with the most predictable performance characteristics. However, as long as the returned guard object is alive, the internal generation lock is being held and that prevents write operations from completion and they’ll spin-wait for the unlock. By default, all the pointer instances share the same generation lock (and it’ll therefore prevent write operations even on other pointers from completion). However, theIndependentArcSwap
uses a private generation lock for each instance. In general, this is suitable for very fast things ‒ like reading a single scalar value out of a configuration, but not keeping it around or doing expensive lookups in data.
Additionally, it is possible to use cache handles to get further speed improvement at the cost of less comfortable API and possibly keeping the older values alive for longer than necessary.
RCU
This also offers an RCU implementation, for read-heavy
situations. Note that the RCU update is considered relatively slow operation (slower than
simple write). In case there’s only one update thread, using
store
is enough.
Atomic orderings
It is guaranteed each operation performs at least one SeqCst
atomic read-write operation,
therefore even operations on different instances have a defined global order of operations.
Unix signal handlers
Unix signals are hard to use correctly, partly because there is a very restricted set of functions one might use inside them. Specifically, it is not allowed to use mutexes inside them (because that could cause a deadlock).
On the other hand, it is possible to use peek_signal_safe
(but not the
others). Note that the signal handler is not allowed to allocate or deallocate
memory, therefore it is not recommended to upgrade
the
returned guard (it is strictly speaking possible to use that safely, but it is hard and brings
no benefit).
Customization
While the default ArcSwap
and lease
is probably good enough for most of
the needs, the library allows a wide range of customizations:
- It allows storing nullable (
Option<Arc<_>>
) and non-nullable pointers. - It is possible to store other reference counted pointers (eg. if you want to use it with a
hypothetical
Arc
that doesn’t have weak counts), by implementing theRefCnt
trait. - It allows choosing internal locking strategy by the
LockStorage
trait.
Examples
extern crate arc_swap;
extern crate crossbeam_utils;
use std::sync::Arc;
use arc_swap::ArcSwap;
use crossbeam_utils::thread;
fn main() {
let config = ArcSwap::from(Arc::new(String::default()));
thread::scope(|scope| {
scope.spawn(|_| {
let new_conf = Arc::new("New configuration".to_owned());
config.store(new_conf);
});
for _ in 0..10 {
scope.spawn(|_| {
loop {
let cfg = config.lease();
if !cfg.is_empty() {
assert_eq!(*cfg, "New configuration");
return;
}
}
});
}
}).unwrap();
}
Alternatives
There are other means to get similar functionality you might want to consider:
Mutex<Arc<_>>
and RwLock<Arc<_>>
They have significantly worse performance in the contented scenario but are comparable in uncontended cases. They are directly in the standard library, which means better testing and less dependencies.
The same, but with parking_lot
Parking lot contains alternative implementations of Mutex
and RwLock
that are faster than
the standard library primitives. They still suffer from contention.
crossbeam::atomic::ArcCell
This internally contains a spin-lock equivalent and is very close to the characteristics of
parking_lot::Mutex<Arc<_>>
. This is unofficially deprecated. See the
relevant issue.
crossbeam-arccell
It is mentioned here because of the name. Despite of the name, this does something very
different (which might possibly solve similar problems). It’s API is not centric to Arc
or
any kind of pointer, but rather it has snapshots of its internal value that can be exchanged
very fast.
AtomicArc
This one is probably the closest thing to ArcSwap
on the API level. Both read and
write operations are lock-free, but neither is wait-free, and the performance of reads and
writes are more balanced ‒ while ArcSwap
is optimized for reading, AtomicArc
is „balanced“.
The biggest current downside is, it is in a prototype stage and not released yet.
Modules
Caching handle into the ArcSwapAny.
Customization of where and how the generation lock works.
Structs
An atomic storage for a smart pointer like Arc
or Option<Arc>
.
A temporary storage of the pointer.
Traits
A trait describing things that can be turned into a raw pointer.
A trait describing smart pointers that can’t hold NULL.
A trait describing smart reference counted pointers.
Functions
Comparison of two pointer-like things.
Type Definitions
An atomic storage for Arc
.
An atomic storage for Option<Arc>
.
An atomic storage that doesn’t share the internal generation locks with others.