We often take multi-threading for granted, given its prevalence in modern systems. Data races are often ridiculously tricky, and designing language specs around it even more so.
I was recently reading a paper(title is the same as this post's[1]) which goes into depth about why optimizing compilers find dealing with multi-threading exceptionally hard - especially arguing that creation of thread functionality as a library (i.e. uninvolved with the compiler implementation) is impossible.
Multithreading as a feature must be baked into the language, the earlier in its life-cycle, the better (which tells us why C/C++ data race specs pre-2011 were just UB, Java's are still ridiculously complex, and within safe Rust they're impossible to create).
Any program that is free from data races must be sequentially consistent (i.e. the instructions for each thread must come together in an interleaved but causal fashion).
y = 0, x = 0
# Thread 1
if (y == 1)
x += 1
# Thread 2
if (x == 1)
y += 1
This code should ideally do nothing, x and y will never be 1.
If we assume no compiler involvement in the threading,
i.e. the compiler sees the two thread instructions in isolation and tries to optimize,
a perfectly valid optimization might be the following -
y = 0, x = 0
# Thread 1
x += 1
if (y != 1)
x -= 1
# Thread 2
y += 1
if (x != 1)
y -= 1
This program is still identical in its behaviour for each thread. Now however, we can see that interleaving the two thread instructions can cause x and y both to be 1 -
y = 0, x = 0
# Thread 1
x += 1 # x == 1
# Thread 2
y += 1 # y == 1
if (x != 1)
y -= 1 # y does not get decremented
# Thread 1 contd.
if (y != 1)
x -= 1 # x does not get decremented either
This is invalid.