Atomic Operation in C#.Net

Introduction

Atomic Operation is named academic to Linearizability, Atomicity is a guarantee of isolation from concurrent processes, it can be enfored by hardware level build on Cache Coherence protocol, or software level exclusive lock. In this blog post, I am going to explore a few number of mechanisms to achieve atomic operation in .Net.

What are Atomic operations and what are not?

In C# Specification, the stamement about atomic operation is:

“Reads and writes of the following data types shall be atomic: bool, char, byte, sbyte, short, ushort, uint, int, float, and reference types.” Also: “…there is no guarantee of atomic read-modify-write, such as in the case of increment or decrement.”.

Threading In C# by Joseph Albahari described:

"Read/write on a field of 32-bit or less is always atomic, operations on 64-bit are guaranteed to be atomic only in 64-bit OS, statements that combine more than one read/write operation are never atomic."

For example, following operations are guaranteed to be atomic operations:

 int i = 3; // Always atomic
long l = Int64.MaxValue; // Atomic in 64-bit enviroment, non-atomic on 32-bit environment

Code like below are never atomic:

 int i = 0;
int j += i// Non-atomic, read and write operation
i++;         // Non-atomic, read and write operation

And I am trying to document atomic operation in a lower level in the section below.

Essence

Comsidering two threads (or two processes) are running simultaneously: T1 and T2, there is a field stored in memory, T1 reads its value and do some calculation on the value and finally write the new value back to memory, during the period T2 was actually doing exact same task - i.e. read/calculate/write value back, so possibly one operation on this field overrides another - in other word: the later executed thread (T2) might override the earlier executed one (T1) because when it read the field's value another thread was just manipulating on it, and after T1 finished writing the new value back to memory, T2 writes back.

So a simple example is the increament/decrement operation, as I showed above, it is NOT an atomic operation, it requires both read and write, if many thread simultaneously doing increment on one field, it will very possible cause race condition (the more threads, the more increment operation going to do, the more possibility race condition happen).

Atomicity example

I wrote a Winform application, it do simple work:

Create 10 threads at runtime and simultaneously operates on a private interger, there is a volatile counter with initial value 0, every thread finishes its work will:

  1. Print information on UI.
  2. Increment the counter
  3. Check whether the counter reachs 10, if yes, print CalculatingFinished method which will print the final result.

And I wish UI won't block during the calculation, otherwise I can simply join every created thread. My code skeleton is showing below:

 private const int MaxThraedCount = 10;
private Thread[] m_Workers = new Thread[MaxThraedCount];
private volatile int m_Counter = 0;
private Int32 x = 0;

protected void btn_DoWork_Click(object sender, EventArgs e)
{
    ShowStatus("Starting...");

    for (int i = 0; i < MaxThraedCount; i++)
    {
        m_Workers[i] = new Thread(IncreaseNumber) { Name = "Thread " + (i + 1) };
        m_Workers[i].Start();
    }
}

void IncreaseNumber()
{
    try
    {
        for (int i = 0; i < 10000; i++)
        {
            // Different strategy to increment x
        }

        ShowStatus(String.Format("{0} finished at {1}", Thread.CurrentThread.Name, m_Watcher.Elapsed.TotalMilliseconds));
       
        // Increases Counter and decides whether or not sets the finish signal
        m_Counter++;
        if (m_Counter == MaxThraedCount)
        {
            // Print finish information on UI
            CalculatingFinished();
            m_Counter = 0;
        }
    }
    catch (Exception ex)
    {
        throw;
    }
}

public void ShowStatus(string info)
{
    this.InvokeAction(() => listInfo.Items.Add(info));
}

private void CalculatingFinished()
{
    ShowStatus("\r\nAll Done at: " + m_Watcher.Elapsed.TotalMilliseconds);
    ShowStatus("The result: " + x.ToString());
}

I highlighted "// Different strategy to increment x", and will try a few number of ways to achieve atomicity using FCL libriries.

Let's first see the non-atomic routine - simple do x++ in each thread: 

 for (int i = 0; i < 10000; i++)
{
    x++;
}

Since x++ is NOT atomicity, in additional, I made a large loop - 10 thousand times, in my experience, I NEVER get the correct result, the screenshot is below:

Non-Atomic Operation

Analysis and solutions

To solve the issue and ensure atomicity, by more than 2 weeks brokenly study I found the following feasible ways (in theory, not considering performance):

  1. Interlocked.Increment
  2. Apply exclusive lock (or Moniter.Enter) outside the for loop.
  3. AutoResetEvent to ensure threads doing task one by one. 
  4. Create a temp integer in each thread and once finish add temp onto x under an exclusive lock.
  5. ReaderWriterLockSlim.
  6. Parallel.For coordinate with Interlocked.Increment. 

All of above can achieve atomic operation on incresing x's value, and got expected result:

Atomic operation

In fact I did try other ways such as using MemoryBarrier, Thread.VolatileRead/VolatileWrite - StackOverFlow question link, but failed, if dear readers know there is way to use them to achieve the goal please kindly guide me.  

Demonstration code

In this section I will listed key code for implementing the 5 solutions above.

Solution #1: Interlocked.Increment

 for (int i = 0; i < 10000; i++)
    Interlocked.Increment(ref x);

Solution #2: Apply exclusive lock (Moniter) outside the for loop.

 private readonly string m_Locker = "THREAD_LOCKER";

Monitor.Enter(m_Locker);
for (int i = 0; i < 10000; i++)
    x++;
Monitor.Exit(m_Locker);

Solution #3: AutoResetEvent to ensure threads doing task one by one.

 private static AutoResetEvent m_AutoReset = new AutoResetEvent(false);

protected void btn_DoWork_Click(object sender, EventArgs e)
{
    ShowStatus("Starting...");

    for (int i = 0; i < MaxThraedCount; i++)
    {
        m_Workers[i] = new Thread(IncreaseNumber) { Name = "Thread " + (i + 1) };
        m_Workers[i].Start();
    }
   
    m_AutoReset.Set();
}

void IncreaseNumber()
{
    m_AutoReset.WaitOne();
    for (int i = 0; i < 10000; i++)
        x++;
    m_AutoReset.Set();
}

One notable point is in this case (UI non-blocking) it is not easy to use Monitor.Enter/Monitor.Pulse pair to replace AutoResetEvent and implement "one by one" logic becasue Monitor.Pulse won't maitain state, below is the description in MSDN:

Important

The Monitor class does not maintain state indicating that the Pulse method has been called. Thus, if you call Pulse when no threads are waiting, the next thread that calls Wait blocks as if Pulse had never been called. If two threads are using Pulse and Wait to interact, this could result in a deadlock. Contrast this with the behavior of the AutoResetEvent class: If you signal an AutoResetEvent by calling its Set method, and there are no threads waiting, the AutoResetEvent remains in a signaled state until a thread calls WaitOneWaitAny, or WaitAll. The AutoResetEvent releases that thread and returns to the unsignaled state.


In my Winform application, if I call Monitor.Pulse() in button click event, many threads will not receive the signal (whereas AutoResetEvent will remain signaled state)! I wrote a simple routine to demonstrate this:
 private static readonly string _locker = "THREAD_LOCKER";
public static void Main()
{
    for(int i = 0;i<5;i++)
    {
        Thread t = new Thread(DoWork);
        t.Name = "T" + (i + 1);
        t.IsBackground = true;
        t.Start();
    }
   
    //Thread.Sleep(500);
   
    Monitor.Enter(_locker);
    Console.WriteLine("Main thread");
    Monitor.Pulse(_locker);
    Monitor.Exit(_locker);
}

private static void DoWork()
{
    Monitor.Enter(_locker);
    Monitor.Wait(_locker);
    Monitor.Pulse(_locker);
    Monitor.Exit(_locker);
    Console.WriteLine(Thread.CurrentThread.Name + " finished and exist");
}
Removing "Thread.Sleep(500)" will *VERY POSSIBLY* lead less than 5 thread working, because creating 5 threads requires not short time (kenel object, TEB, kenel/user stack), during the period one just created thread (T2) might get the signal or might not (much more possible), because when the previously created thread (T1) calling "Monitor.Pulse(_locker)" T2 had NOT been setup, T2 and the thread created later will have no chance to get signal! They will be waiting... So the 0.5 seconds is used to give time to create 5 thread, otherwise main thread will quit immediately and background thread will be collected.

Solution #4: Create a temp integer in each thread and once finish add temp onto x under a exclusive lock.

 private readonly string m_Locker = "THREAD_LOCKER";

void IncreaseNumber(object objThreadName)
{
    int tmp = 0;
    for (int i = 0; i < 10000; i++)
        tmp++;

    lock (m_Locker)
        x += tmp;
}

Solution #5: ReaderWriterLockSlim.

 void IncreaseNumber(object objThreadName)
{
    // Or we can use ReaderWriterLock.AcquireWriterLock(500) but it has more performance overhead and is not recommended
    m_ReaderWriterLocker.EnterWriteLock();
    for (int i = 0; i < 10000; i++)
        x++;
    m_ReaderWriterLocker.ExitWriteLock();  // Or ReaderWriterLock.ReleaseWriterLock();
}

Please note that ReaderWriterLock class is not recommended, it "take about five times longer to execute than a call to Monitor's Enter method." please refer: Reader/Writer Locks and the ResourceLock Library by Jeffery Richter.

Solution #6: Parallel.Forcoordinate with Interlocked.Increment.

 Parallel.For(0, 100000, (i) => Interlocked.Increment(ref x));

Conclusion

In this post I took a simple & straight-forward example: 10 threads simultaneously operating on on field, to experiment atomicity operation in C#.Net, using synchronization technology including exclusive locking, signaling, non-blocking synchronization, I guess this is a very good example to master basic FCL thread libraries/concepts such as Interlocked, Monitor, MemoryBarrier, volatile, AutoResetEvent, ReaderWriterLockSlim, etc.

Multi-threading programming is indeed very complex, during my investigation I happened to saw even Jon Skeet admitted he had “his eyes opened to the precise meaning of volatile which isn't "always read from main memory, always write directly to main memory" (link), so as a rookie in this field I should invest more effort on it:)

Further Reading

Threading In C# (Strongly recommended!!!)

Linearizability Wikipedia

Introducing the new ReaderWriterLockSlim in Orcas

Asynchronous Code Blocks

Atomic Operations

Boosting Performance with Atomic Operations in .NET 4

Tags:

Categories:

Updated:

Leave a comment