Skip to content

Concurrency in C++

task: a computation that can be executed concurrently with other tasks. thread: a system representation of a task in a program (not strict one to one, may have multiple tasks in case of thread pool), shared the same memory space, require sync. process: an instance of a program in execution, own memory space, resource, execution env, and multiple threads. parallelism: the ability to execute multiple tasks simultaneously. concurrency: the ability to manage multiple tasks at the same time, regardless of whether they are executed simultaneously or not.

Threads

1768769246769

  • struct F is a functor (callable object) with operator(). When you do thread t1 {f};, it creates a thread that runs f.operator()() (which does nothing in the example—it's empty).
  • t1.join() waits for t1 to finish. No results are transferred here because F::operator() returns void.

thread::join() vs thread::detach()

  • join(): waits for the thread to finish and allows you to retrieve results if needed. It is a blocking call.
  • detach(): allows the thread to run independently from the main thread. The thread will continue to run even if the main thread finishes, and you cannot retrieve results from it.

full usage:

  1. use thread t1 {f, ref(params)} to pass to a function by value

  2. use thread t1 {f, cref(params), &res} to pass to a function by reference and get the result back

  3. use thread t2 {F(params)} to pass to a function object

  4. to set up that get the results back:

class F {
public:
    F(const vector<double>& vv, double p) :v{vv}, res{p} { }
    void operator()();
    // place result in *res
private:
    const vector<double>& v;
    // source of input
    double res; //target for output
};

then call: thread t2 {F{vec2,&res2}};

Locks and Mutexes

  • mutex: a kernel-level object that can block threads trying to acquire it (mutual exclusion).
  • unique_lock: a mutex ownership wrapper that provides more flexibility than std::lock_guard.
  • lock_guard: a mutex ownership wrapper that provides a convenient RAII-style mechanism for owning a mutex for the duration of a scoped block.
  • deferred_lock: a tag type used to indicate that a mutex should not be locked upon construction of a lock object.
  • condition_variable: a synchronization primitive that can be used to block a thread until a particular condition is met.
  • future: a mechanism to access the result of an asynchronous operation.
  • promise: a mechanism to set the value of a future from another thread.
  • async: a function template that runs a function asynchronously (potentially in a separate thread) and returns a future that will hold the result of the function.
  • packaged_task: a class template that wraps a callable object and allows its result to be retrieved asynchronously via a future.

RAII (Resource Acquisition Is Initialization)

  • RAII is a programming idiom used in C++ to manage resources such as memory, file handles, and locks. It ensures that resources are properly released when they are no longer needed, typically by tying the resource's lifetime to the lifetime of an object. When an object goes out of scope, its destructor is called, which can release the resource it manages.
  • RAII for Locks: std::lock_guard and std::unique_lock are examples of RAII wrappers for mutexes. if the mutex is already owned, the thread blocks until it's free. You don't "fail" immediately; it waits (that's the point of synchronization).
  • the use of conditional variables: is to avoid busy waiting (where a thread continuously checks for a condition to be true, consuming CPU resources). Instead, a thread can wait on a condition variable, which will block the thread until another thread signals that the condition has been met. when cond.wait(lck) is called, the mutex is released and the thread is blocked until notified by cond.notify_one() or cond.notify_all().
mutex m; // controlling mutex
int sh;
// shared data
void f()
{
    unique_lock<mutex> lck {m}; // acquire mutex
    sh += 7;
} //release mutex implicitly
void f()
{
    // ...
    unique_lock<mutex> lck1 {m1,defer_lock}; // defer_lock: don’t yet try to acquire the mutex
    unique_lock<mutex> lck2 {m2,defer_lock};
    unique_lock<mutex> lck3 {m3,defer_lock};
    // ...
    lock(lck1,lck2,lck3); //acquire all three locks
    // ... manipulate shared data ...
}// implicitly release all mutexes

Conditional Variables

A conditional_variable is a mechanism allowing one thread to wait for another. it allows a thread to wait for some condition(or event) to occure as the result of another thread's execution.

1768770625792 1768770633636

Semaphore

A semaphore does not behave like a mutex in terms of "ownership" or "exclusive access". #include <semaphore> A semaphore is a very simple synchronization primitive that maintains an internal counter (non-negative integer). acquire() (or acquire(n)): Decrements the counter by 1 (or n). → If the counter would become negative → the thread blocks (waits) until enough count is available. release() (or release(n)): Increments the counter by 1 (or n). → Can wake up one (or more) waiting threads. No mutex is automatically tied to it — it's purely for counting/signaling.

Futures and Promises

std::future<T> and std::promise<T> allow tasks to communicate results (values or exceptions) without shared variables or explicit locks. It's like a one-way channel: the "producer" task puts a result into a promise, and the "consumer" task reads it from a linked future. The standard library handles the transfer efficiently (often using hidden synchronization).

  • Producer: set_value() or set_exception() to set the result or an exception.
  • Consumer: get() to retrieve the result,

packaged_task

std::packaged_task<FuncSig> is basically a smart wrapper that connects:

A callable (function, lambda, functor…) A hidden promise (created automatically inside) A public future (you get it with .get_future())

double accum(double beg, double end, double init)
// compute the sum of [beg:end) starting with the initial value init
{
    return accumulate(beg,end,init);
}
double comp2(vector<double>& v)
{
    using Task_type = double(double,double,double); // type of task
    packaged_task<Task_type> pt0 {accum}; // package the task (i.e., accum)
    packaged_task<Task_type> pt1 {accum}; // package the task (i.e., accum)
    future<double> f0 {pt0.get_future()};  // get hold of pt0’s future
    future<double> f1 {pt1.get_future()};  // get hold of pt1’s future
    double first = &v[0]; // REFERENCE to the first element of the vector
    thread t1 {move(pt0),first,first+v.size()/2,0}; // create a thread for pt0, with first half of the vector as input
    thread t2 {move(pt1),first+v.size()/2,first+v.size(),0}; // create a thread for pt1, with second half of the vector as input
    // ...
    return f0.get()+f1.get();
}
* std::move(pt0) is required because thread constructor takes its callable by value (it wants to own it), but packaged_task forbids copying → you must move it. Without move(), the code simply wouldn't compile. * move(pt0) transfers ownership of the packaged_task object (and especially the internal promise it owns) into the thread constructor. * After this line, pt0 is in a valid but unspecified (moved-from) state — you should not use it anymore.

async

std::async is a function template that runs a function asynchronously (potentially in a separate thread) and returns a std::future that will hold the result of the function. It provides a convenient way to execute tasks concurrently without having to manage threads directly.

EASY! but with Limitations:

Don’t even think of using async() for tasks that share resources needing locking – with async() you don’t even know how many threads will be used because that’s up to async() to decide based on what it knows about the system resources available at the time of a call. For example, async() may check whether any idle cores (processors) are available before deciding how many threads to use.

double comp4(vector<double>& v)
// spawn many tasks if v is large enough
{
    if (v.size()<10000) return accum(v.begin(),v.end(),0.0);     // is it worth using concurrency?

    auto v0 = &v[0];
    auto sz = v.size();
    auto f0 = async(accum,v0,v0+sz/4,0.0);
    auto f1 = async(accum,v0+sz/4,v0+sz/2,0.0);
    // first quarter
    // second quarter
    auto f2 = async(accum,v0+sz/2,v0+sz*3/4,0.0); // third quarter
    auto f3 = async(accum,v0+sz*3/4,v0+sz,0.0); // fourth quarter
    return f0.get()+f1.get()+f2.get()+f3.get(); // collect and combine the results
}

questions

Why do spurious wakeups happen? Who/what causes them?

Spurious wakeups are a well-known POSIX threads / operating system behavior. A thread waiting on a condition variable can wake up even though no other thread called notify_one() or notify_all(). Possible causes (implementation-defined, you can't control them):

The kernel was doing some internal housekeeping and decided to wake some waiting threads "just in case" The condition variable implementation uses a broadcast primitive internally and wakes more threads than intended Old race conditions in very old kernel implementations Power management events, scheduler decisions, etc.

Important facts:

Spurious wakeups are rare in practice (you might never see one in years of running) But they are allowed by the C++ standard (and POSIX) That's why you must always write:

while (!predicate()) {          // ← must be a loop
    cond.wait(lock);
}
or the safer modern pattern:
cond.wait(lock, [this]{ return !mqueue.empty(); });
The second form (predicate overload, C++11+) is strongly preferred today.

Difference between std::thread and std::async — when to prefer which?

Prefer async when: - Tasks are independent (no or minimal shared state) - You mainly want to parallelize CPU-bound work - You want clean code with automatic result/exception handling - Task size is reasonably large (≥ few milliseconds)

Prefer std::thread (or thread pools) when:

  • You need to share state between tasks (and therefore need locks, condition variables, atomics…)
  • You want precise control over number of threads / thread affinity / priorities
  • You are implementing a long-lived thread (worker thread, game loop, server acceptor…)
  • You are building your own thread pool / task scheduler