I said "due to the timing of A, B and C joining the server" -- that's important and I didn't explain it. And my example might have been wrong above.
Why does this happen? Look at it a different way... bear with this...
Code:
T: ....,....|....,....|....,....|....,....|....,....|
A: a0......a1......a2......a3......
B: b0......b1......b2......b3......
C: c0......c1......c2......c3......
D: d0......d1......d2......d3......
Here, "T" is "real time" passing and "a", "b", "c", "d" represent A, B, C and D hearing "1" on the metronome. It doesn't really show network latency well, though... C and D are playing at about the same time in real time. Network latency means they play against "1" but they get delayed to each other's next "1". Worse, network latency also means D's "1" for "d0" doesn't get to A in time for their "a1" and gets pushed back "a2" and so on -- but B isn't affected, hearing C and D "together".
So you end up with something
similar to what I was trying to convey before:
Code:
A a0/b-/c-/d- | a1/b0/c-/d- | a2/b1/c1/d0 | a3/b2/c2/d1 |
B a-/b0/c-/d- | a0/b1/c0/d0 | a1/b2/c1/d1 | a2/b3/c2/d2 |
C a-/b0/c0/d- | a0/b1/c1/d- | a1/b2/c2/d1 | a2/b3/c3/d2 |
D a-/b0/c-/d0 | a0/b1/c-/d1 | a1/b2/c1/d2 | a2/b3/c2/d3 |
And, if B, C and D do wait until A starts, to show who starts hearing who "when" (in terms of the interval):
Code:
A a0/b-/c-/d- | a1/b-/c-/d- | a2/b1/c1/d- | a3/b2/c2/d1 |
B a-/b-/c-/d- | a0/b1/c-/d- | a1/b2/c1/d1 | a2/b3/c2/d2 |
C a-/b-/c-/d- | a0/b1/c1/d- | a1/b2/c2/d1 | a2/b3/c3/d2 |
D a-/b-/c-/d- | a0/b1/c-/d1 | a1/b2/c1/d2 | a2/b3/c2/d3 |
This is all assuming what's played gets streamed in real time from the client to the server and then out to the other clients. If it's recorded at the client, then you can't hear "1" of an interval until after that interval has completed being played, then compressed, transmitted to the server, sent to the other clients, decompressed and played (which obviously takes some time even on modern hardware and networks). That would change the picture but the idea is the same.
You don't really know when the others in the room will hear you, except that what you play on "1" will get delayed until some future time and heard on "1", whilst you're hearing something that's from the past and that's been delayed until you heard "1"...
It's important to understand that to keep jams musical, if that's intended.