Something like first step in this direction
Add stretch markers on every spectrum rise
So, futher steps from my point of view:
1) check rms rise
2) check transient
3) compare all of this and leave only stretch markers, which is 70-90% could be real beat start,
4) search any possible approximately loop coincidence beetween markers, get tempo approximately
5) search mrkers which match some area of this tempo linear markers and leave only mathing markers we get from file