But that's not necessarily a good thing. It's a better approach for one NINJAM user to have sorted out their mix than for every other user to have to do it for them. A single stereo feed into NINJAM is all that's really needed.
I can only guess at the reason why (Brendan/Justin's thoughts would be interesting) but I think the reason is so that offline editing/mixing is possible. If clients send pre-mixed audio then there are fewer editing possibilities afterwards if you want to make a nice MP3 from the jam. At least the cliplogcvt utility that comes in the Ninjam source tree is geared towards importing your ninjam session in a DAW so you can mix it afterwards.

In practice I've never edited/mixed jams after playing though .

How are you checking the latency of these things? I think PulseAudio would stop anyone using the system seriously other than to play back pre-recorded music. It would stop any form of live music creation.
I haven't checked the latency on PulseAudio. If you want to use software instruments/effects then JACK is probably required.

In the case of a single physical instrument plugged in to the PC it's not an issue though. My setup is based around a guitar effects processor with USB audio. It has built-in monitoring so I don't worry about the latency of my own signal. Ninjam is playing remote user audio into the guitar effects processor which mixes it together with the monitor signal. Both the instrument and the headphones are plugged into the guitar effects processor.
