I'm looking for any / all ideas on how to get my HW buffer low and still stable.
Cutting to the chase: Does anyone know if running faster ram has any effect on this?
===========================
Now the long version, if you have the time:
I use a PC for my live music mixer. Obviously, I run a lot of plugins and VI's, (Synthogy Ivory, Several instances of Kontakt, etc) and also need a super-low buffer.
My current rig is a 5960X OC'ed to 4.2 GHz, with 16 GB of basic 2133 ram. Lynx PCIe card.
I can "get away with" a HW buffer of 64.
My whole live session runs full-out with less than 15% total CPU usage.
However, I cannot play heavy keyboard parts, with the pedal down, without getting massive distortion. Obviously it is a buffer problem.
At first I thought it was a streaming issue, even though my Ivory library is on its own dedicated Samsung 840 Pro SSD. However, if I remove all VI's EXCEPT for Ivory, then I can play anything I want with no problems, so clearly it's not a drive or southbridge issue.
So I'm wondering about the ram. Every serious test says the same thing: Faster ram is almost always a waste of money. Plus, of course, overclocking ram can occasionally cause boot problems, which I don't need in a live rig. Then there's the extra heat, which is a problem in a small, rack-mounted PC. So, I don't want to just try this without knowing it if has a real chance of helping.
We know that faster ram typically DOES give slight edge in benchmark tests, so I'm wondering is my extreme HW buffer needs would qualify as such a situation.
Does anyone know?
Last edited by Cableaddict; 01-17-2018 at 04:14 AM.
Just in case the answer to my above query is "Yes," does anyone know whether I should use 2 X 8 GB of DDR4, or 4 X 4 GB? Mushkin offers their redline series in both options, at 3000 speed.
One can find experts who are 100% sure that 2 sticks is faster, and experts who are 100% sure that 4 sticks are faster. Ughh. There's also the issue of the X99 chipset supporting quad-channel memory, but does that mean the 4 sticks is automatically better?
Being that this is a very specific performance issue, does anyone know for sure?
Last edited by Cableaddict; 08-28-2015 at 09:27 PM.
I would guess it will make a slight difference, but probably not a significant difference. The buffer delay is there because of multitasking so if faster RAM makes those background tasks run faster and the operating system gets back to reading your buffer faster, you can get away with a smaller buffer (and associated shorter delay).
But some of those tasks are going to take a fixed number of CPU cycles (or a fixed amount of time) and faster RAM isn't going to make your CPU run faster. It's not always CPU utilization... You can be using only a small percentage of the CPU, but if some task takes a few milliseconds too long, the recording/input buffer overflows (or the playback/output buffer underflows) and you get a nasty glitch.
Quote:
However, I cannot play heavy keyboard parts, with the pedal down, without getting massive distortion. Obviously it is a buffer problem.
What does "pedal down" do? Does a bigger buffer make the distortion go away?
You have a good sound card, and plent of CPU, so usually if you still have problems it's usually down to excessive DPC latency.
Disabling all power saving features in bios can make a big difference to DPC latency stability.
Other hardware/drivers can have an effect too. Nvidia graphics cards seem to result in larger DPC latency time than AMD or Intel.
On my main audio workstation I reduced the DPC latency from over 400, down to 67-70 by removing the Nvidia GTX 570 that was in it, and replacing it with an AMD 260X.
With speed step, turbo boost, core parking... all that stuff enabled the latency goes back up to 200 ish.
Wireless network adapters are also a common culprit.
It's also possible that it's a problem with the design of the virtual instrument that causes it to fail to stream/process fast enough at low latency settings, when the hardware is more than capable of getting the work done fast enough.
You have a good sound card, and plent of CPU, so usually if you still have problems it's usually down to excessive DPC latency.
Disabling all power saving features in bios can make a big difference to DPC latency stability.
Other hardware/drivers can have an effect too. Nvidia graphics cards seem to result in larger DPC latency time than AMD or Intel.
On my main audio workstation I reduced the DPC latency from over 400, down to 67-70 by removing the Nvidia GTX 570 that was in it, and replacing it with an AMD 260X.
With speed step, turbo boost, core parking... all that stuff enabled the latency goes back up to 200 ish.
Wireless network adapters are also a common culprit.
It's also possible that it's a problem with the design of the virtual instrument that causes it to fail to stream/process fast enough at low latency settings, when the hardware is more than capable of getting the work done fast enough.
I can do anything but disabling C1E as this makes my CPU 15-20C hotter, just like full loading all the time.
About the GPU issue, are you still keeping the old card? I suspect the culprit is the card's HDMI audio driver but I am using a Radeon so cannot experiment with it. Did you try uninstall those audio drivers?
My whole live session runs full-out with less than 15% total CPU usage.
However, I cannot play heavy keyboard parts, with the pedal down, without getting massive distortion. Obviously it is a buffer problem.
It's not necessarily a buffer size issue (although, the lower the buffer size, the harder your CPU has to work to fetch all the necessary data to spit out the buffers as fast, and with a lot of voices streamed from your sampler(s), that ALSO taxes the CPU, so...). You should just limit the polyphony so that there aren't TOO many voices playing at once.
In any case, more/faster RAM won't help you one bit. It's a CPU related thing.
Hi, I don't think faster ram actually goes faster, faster ram means it can be clocked faster, so unless you up your clock speed on the motherboard it will perform the same as original ram + what evil dragon said.. Dave
__________________
'Retired technician - not a musician' and registered Reaper user since July 2008
'Excellence is not a skill, It is an attitude' Ralph Marston quotes.
Music at http://soundcloud.com/fixerdave
I can do anything but disabling C1E as this makes my CPU 15-20C hotter, just like full loading all the time.
You DO have after market tower cooler for your CPU and well ventilated and Silently build PC case using only super silent 12cm fans don't you?
If not, you're doing it wrong (don't take this as bashing, just a joke)
Powerful computers these days can be almost completely silent if built right (below 30dB in 100% utilization). I'm 10 meters from the kitchen and there's one wall in between me and the kitchen... I can hear the fridge humming before this computer that's right next to me.
C1E is very important regarding DPC latency since it prevents your CPU from going to low power state unless there's work to do, and this "waking up" from low power state to a high power state isn't instantaneous. And that short moment can be the key between buffer underruns under load, or not.
And any good tower cooler (e.g Noctua) can keep your highly overclocked CPU within acceptable temperature even if run 24/7 at 100%. And be still silent.
I've read some posts regarding C1E in gearslutz as well, the opinions are mixed. To me C1E has no significant differences in LatencyMon and DPC Latency Checker, or in real projects like DAWs loaded with FXs and instruments. The one makes significant differences in my system is HPET.
Also I've made a mistake in my previous post. The one that make 15-20C differences is PreSonus' power plan in the link below, not C1E, C1E only makes 2-3C differences
Also I've made a mistake in my previous post. The one that make 15-20C differences is PreSonus' power plan in the link below, not C1E, C1E only makes 2-3C differences
The PreSonus profile is at the bottom of the page.
Doesn't really matter.
I'm running with "as much performance as possible" too and it's not affecting my temps that much. Overclocked Core 2 Xeon with 4 cores. I'm still running below 60 with high use and get much better DPC latency because the profile prevents the power state switching that takes time.
You really should do that too since it will help a lot in 99% of cases.
OK, it's been a while and I've done some tests & more thinking.
First, thanks for all the great responses.
A few notes:
As I should have mentioned at the start, my DPC latency is incredibly low, even with all apps running. I get a sustained average of about 58 us, with the peaks at only 69 us. This is the lowest latency I've seen through maybe 10 - 15 computer builds.
One very odd thing, and maybe it points to something: I had DPC Latency Checker open, with just Reaper running. When I QUIT reaper, my DPC reading suddenly shot right through the roof ! I mean FLOORED. Solid rad, left - to -right. I re-opened Reaper and the latency bottomed out again to around 56. I've never seen that before, and it makes me think there might be a glitch in the audio driver. Could this point to anything else?
---------------
Re: speed step, turbo boost,CE-1 states, etc I have experimented with these settings on several past builds, but was never able to get any noticeable performance increase of any kind. With the new PC I just didn't bother, and I'm still getting incredibly low DPC activity. I may try it anyway, but I don't have high hopes.
===================
Say, MAYBE IT’S NOT A BUFFER OR DPC PROBLEM AT ALL:
Thinking on this further: The buffer sees whatever stereo audio Reaper sends to it, correct? OK, so it makes sense that DPC latency would be a factor, but why would the buffer be affected by my holding the sustain pedal down? - Or by my making extra VTI’s active? Neither of these things changes the amount of data going to the buffer, (it’s still a 44.1K 16 bit stereo stream) nor the amount of interrupts.
Is it possible that Reaper itself can somehow get overloaded, and not be able to respond in real-time?
## Exactly what part of the system combines all the various tracks, midi data, and CC# data, into the final stereo output? - And what parts of the computer affect how efficiently this happens?
- And you see, I’m back to thinking about over-clocked ram, since applications actually run in ram. Could it be that Reaper can't get the data to & from ram fast enough?
Last edited by Cableaddict; 09-15-2015 at 08:11 PM.
but why would the buffer be affected by my holding the sustain pedal down? ... Neither of these things changes the amount of data going to the buffer, (it’s still a 44.1K 16 bit stereo stream) nor the amount of interrupts.
If that is an actual sustain pedal, sure it does. What do you think has to come to a complete halt in order for the machine to read that data coming in from that 'hardware' sustain pedal - the buffer? Guess what makes reading that hardware in near real time possible? DPCs and interrupts... Hardware takes precedence over everything else in that regard, it's the same reason your mouse doesn't typically freeze when doing intensive CPU operations.
Did you watch the video I posted?
__________________ Music is what feelings sound like.
You can isolate your system from plugins (just for a benchmark) by running a loopback test both with no plugins inserted and then with what you want to run.
Different plugins have different minimum system block sizes they run stable with.
You can determine what your "latency hog" plugins are with the above in mind. You'll also be able to benchmark what your interface is capable of.
Ex.
Your system runs perfectly stable with a 32 sample buffer at 96k and comes in at 4ms total system. But insert suchinsuch plugin and the only way you can come in under 11ms and still stable is lowering the sample rate to 48k and upping the buffer to 128 samples.
Try 48k if you haven't BTW. It's usually the most efficient. Same block size equals a shorter time for not much more of a CPU hit. 96k is even shorter of course but now it crosses the line of too much more of a CPU hit.
If that is an actual sustain pedal, sure it does. What do you think has to come to a complete halt in order for the machine to read that data coming in from that 'hardware' sustain pedal - the buffer? Guess what makes reading that hardware in near real time possible? DPCs and interrupts... Hardware takes precedence over everything else in that regard, it's the same reason your mouse doesn't typically freeze when doing intensive CPU operations.
Did you watch the video I posted?
I don't understand this. (Not sure I agree.) A mouse controls the PC, so that makes sense. The Roland keyboard does NOT control the computer. It controls only Reaper. Are you saying the PC has to handshake every bit of CC data coming from an external midi device?
Also, the basic Windows mouse driver is fairly low level, but the Roland driver is not. Well, OK, I guess since the audio drive is ALSO high-level, maybe that's a moot point.
Are you actually saying that the Roland sustain pedal creates interrupts that stop the audio buffer from filling? If so, why don't I see a change in DPC activity? If this is so, and you are SURE (I mean, you understand what's actually happening under the hood,) then it would be extremely helpful if you could elaborate.
And IF you are correct, then it's a big problem, as with pitch bend wheels, because even if we thin the CC data with a Reaper plugin, the pedal's damage will already have been done.
So.... then what can I do?
------------------------
BTW, while I definitely appreciate your help, the guy in the video you posted is leaving out a lot of stuff, and was wrong about at least 1 important issue, (possibly 2) so I'm not taking his word on anything.
Last edited by Cableaddict; 09-15-2015 at 11:42 PM.
Try 48k if you haven't BTW. It's usually the most efficient. Same block size equals a shorter time for not much more of a CPU hit. 96k is even shorter of course but now it crosses the line of too much more of a CPU hit.
That's quite interesting. I'll give this a try, since I have plenty of cycles to spare, and since the Reaper-Lynx combo can auto-switch sample rates on a per-session basis.
I tried 48K, but had to double my HW buffer (to 128) in order to not have massive distortion. That theoretically means MORE round-trip latency.
I tried 88.2K just for kicks. My CPU usage was still only at 25%, and even across all 8 cores (Go, Reaper !) but I had to go to a 256 buffer, and still got minor clicks.
Well, it was worth a try, but no love there.
Last edited by Cableaddict; 09-15-2015 at 11:55 PM.
USB Mouse, keyboard, USB sound card, USB printer, USB control surface etc. At the hardware level, it has to be read and processed. Every single one of those, because they are hardware, typically get priority over software because the hardware can't wait around. This is the entire idea around DPCs, interrupts and audio glitches.
Quote:
Hardware interrupts are used by devices to communicate that they require attention from the operating system.[2] Internally, hardware interrupts are implemented using electronic alerting signals that are sent to the processor from an external device, which is either a part of the computer itself, such as a disk controller, or an external peripheral. For example, pressing a key on the keyboard or moving the mouse triggers hardware interrupts that cause the processor to read the keystroke or mouse position.
Quote:
Are you actually saying that the Roland sustain pedal creates interrupts that stop the audio buffer from filling?
Once again, see above and... Did you watch the video I posted? It has every one of these answers.
__________________ Music is what feelings sound like.
Last edited by karbomusic; 09-16-2015 at 06:09 AM.
I tried 48K, but had to double my HW buffer (to 128) in order to not have massive distortion. That theoretically means MORE round-trip latency.
I tried 88.2K just for kicks. My CPU usage was still only at 25%, and even across all 8 cores (Go, Reaper !) but I had to go to a 256 buffer, and still got minor clicks.
Well, it was worth a try, but no love there.
Well, that's unusual and unexpected.
You might have a ringer for an audio interface. For example, I've seen a device out there that only operates at 48k and actually converts everything else (and with the kind of awful performance you'd expect from such a thing). All kinds of interfaces out there - everything from "you've got to be kidding" level design flaws to "happiness and light and just works".
It sounds like you know how to operate the settings from what you've written so far. Just to make sure though, can you verify that you are:
- checking the box and setting the project sample rate in Project Settings
- checking the boxes and setting the sample rate and block size in Preferences/audio/device page
- these are all set consistently so there is no sample rate conversion on the fly going on (which is very processor intensive)
What's the baseline on your hardware?
That is, what is the minimum block size you can run stably at at each different sample rate with your audio interface with no plugins inserted? What is your total system latency for each case? (Not just the Reaper added part displayed at the top of the screen - the total system time as measured from a loopback test.)
If you haven't done loopback tests yet and noted these benchmarks, you really need to in order to see what you are working with at the hardware level before you introduce the variable of plugins.
Note: If your interface has its own control panel app, you have two choices for setup. Control from Reaper or control from the interface's control panel. You check the boxes (and then enter the values) in the Preference page to give control to Reaper. You uncheck the boxes (the entered value is now ignored) to give control to other apps like a proprietary control panel.
Try it both ways. It may be a case of a stubborn control panel app that doesn't want to release control and in that case you would want to set Reaper (by unchecking those boxes) to accommodate.
Any of the above not set intentionally could make for sample rate conversion going on on the fly and absolutely kill any chance for low latency and efficient CPU use.
PS. What Karbomusic is telling you about interrupts is absolutely correct! You need to have the headroom in your system to accommodate.
I did infer/hide something in my last reply. If the sound card is USB (tldr) be aware of any other hardware using the same mobo USB controller (and/or any hubs and how they are powered). He could very well have contention before it ever even gets to the OS.
Edit: Never mind, looks like a PCIe card. I'd still be aware of such similar things though. I wouldn't be opposed to trying a different PCIe slot etc.
__________________ Music is what feelings sound like.
Last edited by karbomusic; 09-16-2015 at 10:05 AM.
Reg. RAM speed: IMHO faster RAM will not make much or any difference here. Also to make use the added speed, you would need your MB to support it. You can look up what memory the MB supports and in case you are not maxed out already, consider replacing. But then again, if the difference between current vs supported speed is not significant, chances are you will pay much more than you will gain. Lower-latency RAM might in fact have some effect, but I will leave that to benchmarks.
Reg. why you see buffer underruns when you use your pedal:
The pedal itself is not what stresses the CPU. It's what this pedal triggers. Which is your virtual instrument. With the pedal pressed, this instrument has a lot more voices to process. For instance, if you play 1 note every second and that note is set to die off after 2 seconds, then with the pedal OFF the calculation is for 2 notes simultaneously. However if with the pedal ON the note gets a sustain of 6 additional seconds, then at any given moment 8 notes are being processed. That taxes the CPU. When the CPU is working hard, it might not get the opportunity to fill that short buffer as often as required to prevent underruns.
So it doesn't look like you have many CPU cycles to spare. You might see 15% of CPU use, but what's interesting is how much of *real-time* CPU capacity is utilized.
What might in fact help for this matter is a CPU with a larger cache if there is one that would fit your MB.
Other than that, to squeeze every bit of performance, you can try to tweak your system:
- Disable unnecessary services and drivers in Windows (but be careful and create a restore point beforehand)
- Disable every unnecessary device in BIOS. For instance, if you don't use FireWire, LAN, built-in sound card, built-in graphics card, disable them.
- disable all networking when working with audio.
- disable some visual effects in Windows (opacity/Aero etc)
- Tune your virtual instruments, incl. the less CPU-hungry of them, for realtime operation
- use a different instrument consuming less resources
Reviving this thread for an important, related update:
While I still have no definitive answer to my main question, (there has STILL never been a careful test done to find out, by any reputable website) I did recently find one very important factor in the "low HW buffer" stew:
Hard pagefaults. (Hit that google button.)
We all know about low DPC latency, but do you know about pagefaults? They are related, and extremely important.
A common cause of bad pagefaults is a buggy driver. It turns out that some NVIDIA drivers of recent past have been horribly buggy. Despite having super-low DPC latency in my current rig, I have been getting obscenely large hard pagefaults. I mean right off the scale.
I tried newer NVIDIA drivers, even older NVIDIA drivers, but no love. Still suspecting this as the cause, I decided to replace my 5 year old video card with a new, GDDR5 based one. I figured that NVIDIA might be more concerned about this problem with their newest drivers, so fingers crossed.
Sure enough, WITH THE NEW CARD, I NOW HAVE ZERO HARD PAGEFAULTS !
-----------------------
I don't know if this is affecting my HW buffer at the moment (I seem to be in a safe zone, size-wise) but theoretically this should make a huge difference with systems that are on the edge.
Are you sure that you aren't just overloading the RT cpu core usage? Because you have many CPU core, it's quite possible to have low overall cpu usage, and still be overloading one core.
edit: Sorry, missed that:
Quote:
and even across all 8 cores
Also, average cpu usage isn't the only consideration. There may be spikes is usage that you don't see in the windows task manager because the CPU usage display is just an average over a fixed interval.
Also, average cpu usage isn't the only consideration. There may be spikes is usage that you don't see in the windows task manager because the CPU usage display is just an average over a fixed interval.
Interesting. I never heard of this before.
Still, since some rigs run OK with even 80% average usage, I have to believe that my rig which never goes about 18% can't be CONTINUOUSLY running out of HW buffer headroom because of cpu usage spikes. The buffer is a BUFFER, after all.
I've seen DAW systems go all horrible sounding after a CPU spike, or a disk access spike, and not come back even after the spike was gone, sometimes for a little while, sometimes not at all, but that certainly doesn't prove that's whats happening in your case.
VI's can be painful things to get working properly.
I remember having problems with BFD at low latency settings. I was sure that an SSD would solve it as I wasn't running out of CPU. So, SSD goes in, same problem. On paper it seemed to me that the SSD should be able to keep up easily.
In the end it turned out that I just didn't have enough ram to set the preloaded sample buffer high enough. More ram, twice the preloaded sample data, no more problems.
Out of curiosity, what sort of ram usage are you seeing when you have problems? Do your VI's have buffers that can be increased to ease any disk I/O problems?
Again, what I'm saying may not be of any use to you, but maybe it can help you get a bead on potential issues.
While important stuff, I no longer think we should be looking at total cpu usage, nor the amount of ram. The more information I find on the subject, the more it supports what's in that video Karbo linked. Whether this will ultimately point back to ram speed as a solution is not clear:
It comes down to what EXACTLY is the HW buffer. Without that knowledge, you can't envision ways to optimize it. Some of you may already have know this, but I sure didn't (& I hope I'm explaining this right.)
Evidently, all the audio coming out of a DAW goes through a single thread. What some people call the "real time CPU thread." Makes sense, really, since everything becomes a single stereo mix. This single thread hands off to the audio driver, and the HW buffer is basically in-between them. What's important is that since it's a single thread, it can only run on a single CPU core.
This is why, as the video Karbo linked explains, a super-fast single core gives better real-time performance than a 16-core monster cpu with slightly slower overall speed. - And why you can have only 10% total cpu activity, and still hit a wall with your HW buffer size.
(One side observation here is that, despite everything that's been written about DAW optimization, for low-latency performance it could well be best to leave turbo-boost ON.)
------------------------------------
BUT I DIGRESS:
We may not be able to do anything about HARDWARE interrupts, but we can certainly still optimize a system for a small HW buffer.
The pertinent question for this particular thread is:
Could faster ram help that single core run that single process any faster?
I don't know, but it may have a lot to do with "ram speed vs cas latency" and how that relates to raw speed vs multi-tasking, in relation with whatever CPU memory manager is to be found in this year's tick or tock.
My head hurts now. It really, really hurts........
Last edited by Cableaddict; 01-17-2018 at 04:29 AM.
Reg. why you see buffer underruns when you use your pedal:
The pedal itself is not what stresses the CPU. It's what this pedal triggers. Which is your virtual instrument. With the pedal pressed, this instrument has a lot more voices to process. For instance, if you play 1 note every second and that note is set to die off after 2 seconds, then with the pedal OFF the calculation is for 2 notes simultaneously. However if with the pedal ON the note gets a sustain of 6 additional seconds, then at any given moment 8 notes are being processed. That taxes the CPU. When the CPU is working hard, it might not get the opportunity to fill that short buffer as often as required to prevent underruns.
So it doesn't look like you have many CPU cycles to spare. You might see 15% of CPU use, but what's interesting is how much of *real-time* CPU capacity is utilized.
This info is not wrong, of course, (and thanks) but I now believe it's not the issue that was plaguing ME.
I think what's to blame is what Karbomusic outlined, above. Windows give super-high CPU priority to all USB devices, (which include that foot pedal.) So, when I put my foot down, it's a giant, honking DPC interrupt.
I recently found out about changing reaper's CPU priority to "real time" from within the task manager. That has helped an AMAZING amount with the HW buffer settings. However, I still see big issues with all usb devices, if my buffer / sample rate settings are "right on the edge."
I'm currently trying to find a way to LOWER priority for the USB devices & also for the video card, though so far I've had no luck. This might be hard-coded into the OS. Karbo seems to be saying this, but IMO, all hardware communicates via a driver, and THAT is still software. So, maybe there's a solution yet.....
- I've started a separate thread on this particular issue. (Setting CPU priorities.)
Last edited by Cableaddict; 01-17-2018 at 04:31 AM.
In general lower the buffer size, the further below 100% CPU you can achieve. So at 256 samples I might get to 97% before the sound breaks up, but at 64 or 32 I might only get as high as 85% before it fails.
I'm not sure that there is necessarily a tweaking solution for that, but I'll certainly be interested to see what results you get with the stuff you're trying. I saw your other thread too, and I'll keep following your progress.
In the end it may be that your expectations of being able to use all of your CPU power at very low latency settings may be a bit too optimistic, but there is only one way to find out