I think a pretty standard thing is to track a volume window or envelope, and if the incoming signal gets much higher than the envelope, that means the signal is increasing quickly and you have a transient. You then need to go back in time to the beginning of the transient, so you need some latency.
|