Go Back   Cockos Incorporated Forums > REAPER Forums > ReaScript, JSFX, REAPER Plug-in Extensions, Developer Forum

Reply
 
Thread Tools
Old 10-01-2024, 04:39 PM   #1
tadave
Human being with feelings
 
tadave's Avatar
 
Join Date: Sep 2024
Location: Phoenix, AZ
Posts: 19
Default ReaSpeech: Speech Recognition for REAPER

Introducing ReaSpeech: Speech Recognition for REAPER using Whisper

Hello, REAPER community!

I'm excited to announce the release of ReaSpeech, a new open-source REAPER script that integrates Whisper, a powerful speech recognition system, directly into REAPER. Whether you're looking to transcribe interviews, podcasts, or audio from your projects, ReaSpeech can help streamline your workflow.



What is ReaSpeech?

ReaSpeech provides an easy-to-use REAPER-based UI that allows you to leverage OpenAI's Whisper for speech-to-text transcription directly within your REAPER projects. Here’s what it can do:
  • Transcribe audio from selected media items in a project.
  • Search transcriptions for specific phrases or words.
  • Navigate your project by jumping to the position of specific words or phrases.
  • Write the transcriptions as regions, markers, or notes within your REAPER project.
Requirements

To use ReaSpeech, you’ll need some basic experience with a programming tool called Docker, as the current setup involves running the Whisper model through a Docker container. Full setup instructions are provided in the GitHub repository.

Why ReaSpeech?

Whether you're a sound designer, editor, or podcaster, manually transcribing speech can be tedious. ReaSpeech automates the transcription process and provides a direct integration between speech recognition and REAPER's powerful editing tools, saving you time and effort.

How to Get Started

ReaSpeech is licensed under GPL 3.0, and you can find the full source code, along with setup instructions, on our GitHub repository:

TeamAudio/reaspeech

We’re still in the early stages of development, and your feedback is invaluable. We’d love to hear your thoughts, suggestions, and any issues you encounter. Better yet, feel free to contribute! We welcome collaboration from anyone interested in open source development.

Get In Touch!

If you have comments, questions, ideas, issues, or just want to chat, we have a Discord server you can join here:

Join the Tech Audio Discord Server!

What's Next?

This is just the beginning. Our goal is to make ReaSpeech a powerful, fully-integrated tool within the REAPER environment. If you're a developer with experience in audio processing, REAPER extensions, or speech recognition, we’d love to collaborate with you to improve and expand ReaSpeech.

Looking forward to your feedback!
tadave is offline   Reply With Quote
Old 10-01-2024, 05:26 PM   #2
tonalstates
Human being with feelings
 
tonalstates's Avatar
 
Join Date: Jun 2020
Posts: 859
Default

Insane! Thank you for even working on something like this, I may not use it much soon but who knows, still I appreciate that you're contributing to this awesome software and community with something so useful, looks great too.
tonalstates is online now   Reply With Quote
Old 10-01-2024, 05:49 PM   #3
tadave
Human being with feelings
 
tadave's Avatar
 
Join Date: Sep 2024
Location: Phoenix, AZ
Posts: 19
Default

Quote:
Originally Posted by tonalstates View Post
Insane! Thank you for even working on something like this, I may not use it much soon but who knows, still I appreciate that you're contributing to this awesome software and community with something so useful, looks great too.
Thank you so much! That is great to hear.

Dave
tadave is offline   Reply With Quote
Old 10-02-2024, 02:18 AM   #4
X-Raym
Human being with feelings
 
X-Raym's Avatar
 
Join Date: Apr 2013
Location: France
Posts: 11,147
Default

Looks quite advanced ! Thanks for sharing !


Could you make a video about it ? This will help share it :P


Side question : does it works with other languages than english ?


Cheers !

edit: Why docker is needed here ? ReaWhisper for eg doesnt use it, so it is more straight forward to setup (though it is less advanced).
X-Raym is offline   Reply With Quote
Old 10-02-2024, 10:38 AM   #5
mikeylove
Human being with feelings
 
Join Date: Sep 2024
Posts: 6
Default

Quote:
Originally Posted by X-Raym View Post
Looks quite advanced ! Thanks for sharing !


Could you make a video about it ? This will help share it :P
we're working on it!

Quote:
Originally Posted by X-Raym View Post
Side question : does it works with other languages than english ?
it does, but with a minor caveat in the case of the extended unicode characters that are required to display some languages properly (asian languages, for example).

these other languages will be correctly recognized+transcribed, and project decorators (markers, regions, notes) will render just fine. inside of the ReaSpeech interface, these characters will all show up as strings of '?'. this is unfortunately a current limitation on the ImGui library, but it's something that's been on our list to address when feasible.

here's an example showing a transcription of portuguese-language content:


Quote:
Originally Posted by X-Raym View Post
edit: Why docker is needed here ? ReaWhisper for eg doesnt use it, so it is more straight forward to setup (though it is less advanced).
i can't speak to the full set of original motivations behind docker. having said that, as a backend option it's nice to be able to define and lock the supporting software set providing the transcription services to specific versions and configurations.

it's an added bonus in this architecture that we can extend backend services in a consistent manner, providing clear plugin and extension points that we and others can use to expand the functionality that's available. there are tasks and workflows that are downstream of basic transcription that will be built on top of these kinds of extension points.

thank you for the feedback and insightful questions!

-michael @ techaud.io
mikeylove is offline   Reply With Quote
Old 10-02-2024, 11:00 AM   #6
tadave
Human being with feelings
 
tadave's Avatar
 
Join Date: Sep 2024
Location: Phoenix, AZ
Posts: 19
Default

Quote:
Originally Posted by X-Raym View Post
Looks quite advanced ! Thanks for sharing !

Could you make a video about it ? This will help share it :P
Thanks for taking a look!

We have a video here, which we made before the project was released. We will work on some updated content, but this should give a good basic impression of what ReaSpeech does:

https://www.youtube.com/watch?v=4_mNTnwYHLg

Quote:
Side question : does it works with other languages than english ?
It does, to some extent. By default, it uses language detection, which can be overridden in the Advanced options. It also supports Whisper's "translate" feature, which supports translation to English only.

Due to an issue with font support in ReaImGui, only characters in the Latin character set can be displayed in the transcript table. Non-Latin characters will be displayed as '?'. However, REAPER does not have this limitation, and if you write your transcript to the REAPER project, the original characters are preserved and readable in REAPER.

Quote:
Why docker is needed here ? ReaWhisper for eg doesnt use it, so it is more straight forward to setup (though it is less advanced).
We tried using command-line Whisper in an earlier version of this project, but we moved to Docker for a few reasons:
  • Setting up Python, Whisper, and FFmpeg was complicated to explain and inconsistent across platforms
  • Running a web service allows models to stay loaded in memory, resulting in better performance
  • Using PyTorch directly from Python enables the use of Whisper features such as Voice Activity Detection (VAD)
  • Using a task scheduler (Celery) enables background processing and tracking the progress and status of transcription tasks

There are a variety of tradeoffs with either approach. ReaSpeech could support command-line Whisper as an alternative to Docker, with some minor restructuring.

It's not necessary to use Docker, if you're comfortable working with Python directly: Running Outside of Docker

Thank you for your feedback!
tadave is offline   Reply With Quote
Old 10-02-2024, 11:33 AM   #7
McSound
Human being with feelings
 
McSound's Avatar
 
Join Date: Jun 2021
Location: Moscow, Russia
Posts: 520
Default

Oh, my! I can't believe it! I often use Subtitle Edit for it, but now we got it right in Reaper! Can't thank you enough, man! Gonna use it much)
McSound is online now   Reply With Quote
Old 10-02-2024, 01:05 PM   #8
X-Raym
Human being with feelings
 
X-Raym's Avatar
 
Join Date: Apr 2013
Location: France
Posts: 11,147
Default

Thanks for detailed explaination :P
I'll come with further feedback when I will have a transcript to do :P
X-Raym is offline   Reply With Quote
Old 10-02-2024, 01:16 PM   #9
mikeylove
Human being with feelings
 
Join Date: Sep 2024
Posts: 6
Default

Quote:
Originally Posted by McSound View Post
Oh, my! I can't believe it! I often use Subtitle Edit for it, but now we got it right in Reaper! Can't thank you enough, man! Gonna use it much)
any feedback or questions you might have from the experience would be more than welcome on our part! we're excited to get this into the hands of people who need it and can let us know of any unexpected hurdles that we might be able to clear to improve those experiences.
mikeylove is offline   Reply With Quote
Old 10-02-2024, 02:59 PM   #10
Jae.Thomas
Human being with feelings
 
Join Date: Jun 2006
Posts: 22,774
Default

WTF THIS IS INSANE thank you

EDIT:

this might be an annoying more general issue, but it won't connect to localhost:9000

forget it, had to enable the optional host setting (as per actual instruction)

I used this to export SRT (without X Y coordinates - did I need this?) for this:



and it worked amazing!

Last edited by Jae.Thomas; 10-03-2024 at 08:55 AM.
Jae.Thomas is offline   Reply With Quote
Old 10-03-2024, 12:17 PM   #11
mikeylove
Human being with feelings
 
Join Date: Sep 2024
Posts: 6
Default

Quote:
Originally Posted by Jae.Thomas View Post
WTF THIS IS INSANE thank you

EDIT:

this might be an annoying more general issue, but it won't connect to localhost:9000

forget it, had to enable the optional host setting (as per actual instruction)

I used this to export SRT (without X Y coordinates - did I need this?) for this:

(youtube embed)

and it worked amazing!
wow. i was really moved by the words in your eulogy. it really resonated with me more than i feel able to articulate. on top of that i feel an intense sense of pride in knowing our tool helped you out - especially in this context. thanks so much for sharing your story and words!

all the condolences for you and the others who carry on without your wonderful mother in physical form. i'd guess that her spirit will be rippling in and around all of you (and the rest of us now!) for quite some time. i hope you find some consolation in that.

(editing from here) x/y coords are for subtitle positioning (like in the case where a subtitle in a default location might obscure something visually relevant or where the lines might be coming from somewhere off-screen). they're use-case specific and not even technically 100% supported across all players.

Last edited by mikeylove; 10-03-2024 at 12:21 PM. Reason: so touched by the video that i forgot to address the x/y coordinates question
mikeylove is offline   Reply With Quote
Old 10-03-2024, 04:31 PM   #12
BPBaker
Human being with feelings
 
BPBaker's Avatar
 
Join Date: Oct 2013
Location: Brooklyn, NY
Posts: 394
Default

I installed ReaSpeech over the weekend and as an audio documentary/podcast editor, it’s seriously impressive and I can’t wait to see how y’all develop it further! THANK YOU for your great work.

I chatted with mikeylove and KVS on the Discord server this weekend with some initial reactions. I’ve been using it over the course of the week and have a bunch of thoughts. Apologies if this is too much, I’m just super excited about what you’ve made! :-D

Feature Requests
- Adding Item/Take markers would be a killer feature. Then transcription could persist inside the items directly as items are moved/copied/pasted/trimmed!
- Also adding Item Notes (though presumably item notes would have to transcribe only within the bounds of the item?)
- Would it be possible to save the transcribed data similar to, say, how Reaper saves peaks data? Transcription can take a while for large files, and it would be nice to not have to regenerate the transcription if you hit clear.
- Add “filter” boxes to filter/sort the results for relevant columns, and would be nice to show/hide columns as needed
- Add a “track” column to sort by track name
- Could you drag/drop words from the ReaSpeech box (groups of words, or multiple phrases) out of the word list into Reaper to make an item with that specific source start timecode
- It might be helpful to have separate columns for the timecode in reference to the source file (as you currently do) and ALSO in reference to the item start position?

Feedback on usage:
- As of the current version, ReaSpeech works best for me when I’m starting out with an unedited file/item. It gets harder to use in a project with many edited items (which is where I want to use it most). ;-)
- I’m a little confused about how "Process Selected Items" and "Process All Items" works. I would have imagined that this option would transcribe and create timecode markers within the bounds of those selected items only. But when I select this option, it seems to transcribe the entire underlying source file, generating searchable ids for the whole file—which takes a lot more time to transcribe.
- And while the IDs are in reference to the file’s source start, the markers ReaSpeech creates seem linked to the original item’s start position in some way that isn’t entirely clear to me. What happens if you copy/paste a transcribed item?
- If I search for words in that item and then select “create markers,” it seems to make marker in the correct location. And then if I clear the markers, move that particular item and click “create marker” again, it now adds a new/additional marker for that word/phrase in reference to the new item position. Cool, that’s what I’d expect! BUT if I don’t search for those specific words or phrases and select “create markers,” it generates markers for moments outside the bounds of the selected item(s). So it would be great for an option that would only place markers within the selected items’ bounds (Item markers and item notes might solve for this)
- Consider moving the “Clear” button to location that makes you less likely to accidentally click it. (Also, it’s unclear if clear refers to clearing the search field or the transcript -- it’s the latter and I had to re-transcribe everything when I accidentally pressed it!)
- It would be nice to speed up the transcription times somehow, if at all possible! (I’m running the CPU version on an M2 Max, and it really does push the CPU! Is Apple Silicon GPU support in the works? (EDIT: I'm working in podcasts with multiple multi-hour source files, so if there were an option to only transcribe audio from the selected items as opposed to the entire underlying source file, perhaps that might speed things up for me considerably?)

BTW, here's what Pro Tools recently previewed: https://www.youtube.com/watch?v=gfkZAQO7bg4 and here's how SoundFlow is separately implementing something like this it into PT with track markers: https://www.youtube.com/watch?v=fKsNgFABAZE
There are some things I like and some I dislike about these approaches as compared to ReaSpeech, but it might be interesting to take a look. I’m confident ReaSpeech stands to be way more powerful! :-)

Last edited by BPBaker; 10-03-2024 at 06:18 PM.
BPBaker is offline   Reply With Quote
Old 10-04-2024, 01:45 AM   #13
McSound
Human being with feelings
 
McSound's Avatar
 
Join Date: Jun 2021
Location: Moscow, Russia
Posts: 520
Default

I gave ReaSpeech a try today. English voice recognition nicely works, except edges of item notes are on slightly earlier positions than voice phrases in audio, like 0.4 sec earlier. 2-minutes audio recognized in 5sec using gpu version in Docker. Impressive speed! When I set Russian it tries to recognize English anyway. Should I use another Whisper model? Any way to specify it manually, other than Small,Medium,Large?
McSound is online now   Reply With Quote
Old 10-04-2024, 07:14 AM   #14
Jae.Thomas
Human being with feelings
 
Join Date: Jun 2006
Posts: 22,774
Default

Quote:
Originally Posted by mikeylove View Post
wow. i was really moved by the words in your eulogy. it really resonated with me more than i feel able to articulate. on top of that i feel an intense sense of pride in knowing our tool helped you out - especially in this context. thanks so much for sharing your story and words!

all the condolences for you and the others who carry on without your wonderful mother in physical form. i'd guess that her spirit will be rippling in and around all of you (and the rest of us now!) for quite some time. i hope you find some consolation in that.
aww thanks - I've been wanting to subtitle it for some time because it was requested by a couple of her friends who have hearing difficulties - and I just couldn't get through it to do it. So this was helpful on a few levels This made it easy for me without having to listen to it piece by piece.

Quote:
Originally Posted by mikeylove View Post
(editing from here) x/y coords are for subtitle positioning (like in the case where a subtitle in a default location might obscure something visually relevant or where the lines might be coming from somewhere off-screen). they're use-case specific and not even technically 100% supported across all players.
so it seems this is more of a corner case situation, I wouldn't need these parameters defined many times. But good to have if it is covering something important.
Jae.Thomas is offline   Reply With Quote
Old 10-04-2024, 08:53 AM   #15
tadave
Human being with feelings
 
tadave's Avatar
 
Join Date: Sep 2024
Location: Phoenix, AZ
Posts: 19
Default

Quote:
Originally Posted by McSound View Post
I gave ReaSpeech a try today. English voice recognition nicely works, except edges of item notes are on slightly earlier positions than voice phrases in audio, like 0.4 sec earlier. 2-minutes audio recognized in 5sec using gpu version in Docker. Impressive speed! When I set Russian it tries to recognize English anyway. Should I use another Whisper model? Any way to specify it manually, other than Small,Medium,Large?
Thanks for trying it out! Timing inaccuracies like that are not uncommon with Whisper. We are using the Faster-Whisper variant, which has improved performance over the original OpenAI Whisper.

If you switch to the Advanced tab, you should see options for model and language. By default, the language setting is set to "Detect", which means Whisper will try to guess the language. You can explicitly set it to Russian. You can also use different models by typing in their name. We're working on a better interface for this, but for now, if you type in an invalid model name, you'll get an error message that lists the supported models.
tadave is offline   Reply With Quote
Old 10-04-2024, 12:31 PM   #16
McSound
Human being with feelings
 
McSound's Avatar
 
Join Date: Jun 2021
Location: Moscow, Russia
Posts: 520
Default

Thanks tadave! Typing the name of model! That's the point! Yes I typed "large-v3" and it successfully recognized Russian. It's a pity that it doesn't remember the settings, so I have to set all up from scratch on new session. But it's ok, all is fully workable! Thanks again for the great contribution to Reaper functionality!
McSound is online now   Reply With Quote
Old 10-09-2024, 10:45 AM   #17
saxmand
Human being with feelings
 
saxmand's Avatar
 
Join Date: Dec 2023
Location: Denmark
Posts: 722
Default

This looks very cool.

I though can't figure out how to get the Lua scripts installed. Am I supposed to copy them from the GitHub manually or is there a way to do it with ReaPack?

Meanwhile I wanted to share this script I did. It basically creates a Video Processor containing subtitles based on an SRT file. Maybe implementing something like that in the app would be cool as well, and then add to a selected video file the transcription as subtitles:

https://forum.cockos.com/showthread.php?t=295034
__________________
Trying to make Reaper the best DAW for film composing!
ReaPack - Script thread - Help me make more free scripts
saxmand is online now   Reply With Quote
Old 10-10-2024, 03:31 AM   #18
Lunar Ladder
Human being with feelings
 
Join Date: Jan 2016
Posts: 1,034
Default

Thank you very much! This is very polished and useful.

Quote:
Originally Posted by tadave View Post
as the current setup involves running the Whisper model through a Docker container
As this is worded in this manner, there might be hope that you can provide a non-Docker alternative, for use on systems where the user can just point to the whisper executable and the model, without Docker installed?
Lunar Ladder is offline   Reply With Quote
Old 10-10-2024, 05:12 AM   #19
nofish
Human being with feelings
 
nofish's Avatar
 
Join Date: Oct 2007
Location: home is where the heart is
Posts: 12,500
Default

Thanks for this project.
What are the (minimum) GPU specs for GPU processing?
nofish is offline   Reply With Quote
Old 10-10-2024, 06:14 AM   #20
saxmand
Human being with feelings
 
saxmand's Avatar
 
Join Date: Dec 2023
Location: Denmark
Posts: 722
Default

Figured out to just download the GitHub project and put it in my scripts folder.

But HOLY SMOKES, what an amazing and well made project!
It even transcribed better than some of those online services.

Just tested with my "srt to subtitle plugin" transcribing Dutch, translating to English exporting the srt and then create the subtitle plugin and it worked great.
__________________
Trying to make Reaper the best DAW for film composing!
ReaPack - Script thread - Help me make more free scripts
saxmand is online now   Reply With Quote
Old 10-12-2024, 06:53 PM   #21
smrl
Human being with feelings
 
Join Date: Sep 2024
Posts: 3
Default

Quote:
Originally Posted by Lunar Ladder View Post
there might be hope that you can provide a non-Docker alternative, for use on systems where the user can just point to the whisper executable and the model, without Docker installed?
You can absolutely use ReaSpeech without docker installed! We chose to distribute using Docker as this way it's os/platform independent and you don't need to worry about Python versions, installing the database, etc.

Please, give it a shot if you're interested! Details are here:

https://github.com/TeamAudio/reaspee...s/no-docker.md
smrl is offline   Reply With Quote
Old 10-12-2024, 10:01 PM   #22
smrl
Human being with feelings
 
Join Date: Sep 2024
Posts: 3
Default

Quote:
Originally Posted by nofish View Post
What are the (minimum) GPU specs for GPU processing?
We haven't tested extensively how much VRAM you need to run each model, but at a minimum you need a NVIDIA card. The amount of required VRAM changes depending on which model you choose.

I took a look at the model sizes, and this is what it looks like to me:
small - requires ~2GB VRAM
medium - requires ~6GB VRAM
large (distil-large-v3) - requires ~4GB VRAM

Quote:
Originally Posted by saxmand View Post
I though can't figure out how to get the Lua scripts installed. Am I supposed to copy them from the GitHub manually or is there a way to do it with ReaPack?
https://forum.cockos.com/showthread.php?t=295034
After you run the Docker image you can use your browser to navigate to localhost:9000, it is serving a webpage with instructions and a link to the lua script. Just copy them to your preferred reascript directory!
smrl is offline   Reply With Quote
Old 10-21-2024, 07:48 AM   #23
soniccustard
Human being with feelings
 
Join Date: Oct 2024
Posts: 1
Default

Ah, been looking for something like this, thank you so much for making it! This is such a great script for anybody working with dialogue, especially game dialogue!

It would be really amazing if you were able to set the transcription to do 42 characters/2 lines per text note for those of us that want to then use HeDa Notes Reader or the HeDa/X-Raym SRT Export.

In my use case it would allow me to export and SRT file which is then fed into a video generator without needing to go in and edit all the note clips.

Is this something that's possible?

Another idea that might be a bit outside of the scope of this script, but maybe someone will see this and get inspired, haha.

Something that you have to deal with in game dialogue a lot is making sure that the filenames are correct. If there was some way of feeding an Excel sheet/CSV into this with all of the filenames and lines as written, then compare the transcription to the written lines, then give a percentage of how close the recording is to the written script, literally every dialogue engineer and editor would use it.

This is fantastic work, thanks so much for sharing this!
soniccustard is offline   Reply With Quote
Old 10-21-2024, 10:44 PM   #24
p07a
Human being with feelings
 
Join Date: Mar 2023
Posts: 2
Default

I tried this out today and it works fine. It can recognize and transcribe Vietnamese reasonably well.

ReaImGui's limitation is with the limited Unicode range. Some characters show up as "?". However, when I export Markers and Regions, the characters show up fine.

I would add a feature to remove Region/Markers that was generated by the plugin.

Also, I selected a media item / take which was much trimmed down from the original audio source. However, when I click "Process selected item", it went ahead and transcribed the entire source. The markers / regions generated seem to assume the media item was unedited too.

The installation process was fine. I guess if I got over the hurdle of installing SWS / ReaPack / etc., installing and running Docker images isn't that far off :P

Thank you for developing this! Love that there's a tool like this out there.
p07a is offline   Reply With Quote
Old 10-25-2024, 02:08 AM   #25
BogdanS
Human being with feelings
 
Join Date: Aug 2013
Location: Ukraine
Posts: 106
Default ???? ???????? ??????

Please fix the bug with fonts in the interface, Cyrillic is displayed as solid question marks (???? ???? ????). When exporting to markers, everything is fine
BogdanS is offline   Reply With Quote
Old 10-25-2024, 07:27 AM   #26
cdmstudios
Human being with feelings
 
cdmstudios's Avatar
 
Join Date: Oct 2008
Posts: 432
Default

this sounds awesome and I can't wait to try it.
From my perspective, I know what I would love to use something like this for:

I don't need a transcription, but what I would like is the ability to search a track of speech for certain words/phrases and have it create a marker list with all the instances of them.

I'm excited to try this!

Charles
cdmstudios is offline   Reply With Quote
Old 11-07-2024, 08:11 AM   #27
80icio
Human being with feelings
 
Join Date: Mar 2016
Location: Italy
Posts: 462
Default

Hey, thanks for your script!

My girlfriend is PHD in sociology and she has a bunch of interviews she needs to either translate or just write down on paper.
I asked her one of the interviews , a rough one with some noise and music for test run of your script.

Everything went smoothly, no bugs on my end.

a couple notes:

I noticed that on whisper they talk about a TURBO model but it's not present in the script, is it the Distil mode?

For this type of use I wish there was a simple TXT export with no Metadata but just the Text content line by line.

I tried to export JSON CSV and SRT, and none of the files was saved with its extension .csv .json or .srt , I had to put it myself.
80icio is online now   Reply With Quote
Old 11-10-2024, 09:19 PM   #28
BPBaker
Human being with feelings
 
BPBaker's Avatar
 
Join Date: Oct 2013
Location: Brooklyn, NY
Posts: 394
Default

Hey hey! I just grabbed the recent v0.4.0 build and see you've added take markers:



The transcriptions now move along with items as you edit. :-D THANK YOU, ReaSpeech team!

Now we just need to get the Reaper devs to add actions to show/hide/toggle take marker visibility... Here's the feature request for that, BTW.

BPBaker is offline   Reply With Quote
Old 11-13-2024, 10:55 PM   #29
smrl
Human being with feelings
 
Join Date: Sep 2024
Posts: 3
Default

Quote:
Originally Posted by 80icio View Post
I noticed that on whisper they talk about a TURBO model but it's not present in the script, is it the Distil mode?
They are different, Turbo model is very recent and based on findings from distil-whisper. I tested the turbo model tonight and it works but requires some changes to the codebase, so no Docker image just yet with it included. It is roughly equivalent to the Distil models as far as I can tell.

Quote:
Originally Posted by 80icio View Post
For this type of use I wish there was a simple TXT export with no Metadata but just the Text content line by line.
I wrote a simple python script for this task, filtering out blank lines, lines that contain --> and lines that contain only a number. I attached it but you need to rename it to .py(forum won't let me upload that extension) and just
python srt-to-txt.py myfile.srt -- it will output myfile_only_text.txt

Quote:
Originally Posted by 80icio View Post
I tried to export JSON CSV and SRT, and none of the files was saved with its extension .csv .json or .srt , I had to put it myself.
Added a github feature request here:
https://github.com/TeamAudio/reaspeech/issues/104

We'll see how that gets on but feel free to comment if you feel strongly enough about it
Attached Files
File Type: txt srt-to-txt.txt (829 Bytes, 591 views)

Last edited by smrl; 11-19-2024 at 12:25 AM.
smrl is offline   Reply With Quote
Old 01-15-2025, 03:24 AM   #30
tzzsmk
Human being with feelings
 
Join Date: Oct 2016
Location: Heart of Europe
Posts: 149
Default

THANK YOU, it works great,

so far the only "problem" is, ReaSpeech GUI doesn't reflect timestamp changes on edited audio - but solution is to use Reaper's Region/Marker Manager for navigation instead

Quote:
Originally Posted by BogdanS View Post
Please fix the bug with fonts in the interface, Cyrillic is displayed as solid question marks (???? ???? ????). When exporting to markers, everything is fine
I can confirm this problem, characters like ěščřžýáíéúů aren't displayed correctly in ReaSpeech GUI, but are indeed correct in source JSON aas well as output Regions and Markers - so not a huge deal imo
__________________
M1 MacMini (16GB, 2TB), RME ADI-2 DAC, Kali IN-8 + WS-12, DELL AW3418DW
M2 Max MacStudio (64GB, 4TB), Sonnet xMacStudio rack, RME HDSPe AIO, RME UFX III
tzzsmk is offline   Reply With Quote
Old 01-17-2025, 11:41 AM   #31
tadave
Human being with feelings
 
tadave's Avatar
 
Join Date: Sep 2024
Location: Phoenix, AZ
Posts: 19
Default ReaSpeech 0.5.0 released

Hello! A new version of ReaSpeech is available on Docker Hub.

New in version 0.5.0:
  • Configurable font size
  • Improved export interface
  • Added support for whisper.cpp ASR engine
  • Better process handling when running outside of Docker
  • Removed dependency on Redis
  • Usability and reliability improvements
Full list of changes: https://github.com/TeamAudio/reaspee...n/CHANGELOG.md

To upgrade, you can follow the instructions here: https://github.com/TeamAudio/reaspee...e-docker-image

Thanks for your valuable feedback!
ReaSpeech team
tadave is offline   Reply With Quote
Old 01-17-2025, 11:52 AM   #32
tadave
Human being with feelings
 
tadave's Avatar
 
Join Date: Sep 2024
Location: Phoenix, AZ
Posts: 19
Default Unicode support issues

Regarding issues with the display of various characters outside of the Latin character set, thanks for mentioning it. We're aware of it, and we are in search of a solution. We created an issue to track it: https://github.com/TeamAudio/reaspeech/issues/94

It sounds like there is some work underway to support dynamic font rasterization in ImGui: https://github.com/ocornut/imgui/pul...ent-2521021252

That would probably be the shortest path for us, but it doesn't exist currently, and other workarounds are complex in comparison. We'll keep thinking about it, and we welcome suggestions. Thanks!
tadave is offline   Reply With Quote
Old 01-25-2025, 06:12 AM   #33
80icio
Human being with feelings
 
Join Date: Mar 2016
Location: Italy
Posts: 462
Default

Quote:
Originally Posted by tadave View Post
Hello! A new version of ReaSpeech is available on Docker Hub.

New in version 0.5.0:
  • Configurable font size
  • Improved export interface
  • Added support for whisper.cpp ASR engine
  • Better process handling when running outside of Docker
  • Removed dependency on Redis
  • Usability and reliability improvements
Full list of changes: https://github.com/TeamAudio/reaspee...n/CHANGELOG.md

To upgrade, you can follow the instructions here: https://github.com/TeamAudio/reaspee...e-docker-image

Thanks for your valuable feedback!
ReaSpeech team
Thanks for the update, and thanks for adding the file extension add option!

I just transcribed a few more files, I just want to report that I tried it on an m4a file and it did not work. Not sure if m4a is
No error message , the progress bar is not going through the "transcribing" process but straight to "success".

I converted the file to regular wavs and it worked.
80icio is online now   Reply With Quote
Old 01-28-2025, 10:40 AM   #34
tadave
Human being with feelings
 
tadave's Avatar
 
Join Date: Sep 2024
Location: Phoenix, AZ
Posts: 19
Default

Quote:
Originally Posted by 80icio View Post
I just transcribed a few more files, I just want to report that I tried it on an m4a file and it did not work. Not sure if m4a is
No error message , the progress bar is not going through the "transcribing" process but straight to "success".
Thanks for your comments, and thank you for this report! I created an issue for it: https://github.com/TeamAudio/reaspeech/issues/160

It looks like it should be possible to support .m4a files, possibly with the help of a Python library ("qtfaststart").

Dave
tadave is offline   Reply With Quote
Old 02-27-2025, 02:26 PM   #35
atmosfar
Human being with feelings
 
Join Date: May 2020
Posts: 106
Default

This is great! Amazing work. I was playing around a bit with getting whisper-cpp in command line before I discovered this.

I've managed to make it work with whisper.cpp and CoreML on my M1 Pro Macbook, which is transcribing much faster than Metal. It took a fair bit of hacking around in the project to make all of the dependencies compatible, so I haven't got a well documented process just yet. But here's what I remember.

You need the latest version of pywhispercpp, and you need to use Python3.11. So change the toml file to be like this:

Code:
[tool.poetry]
name = "reaspeech"
version = "1.0.0"
description = "Speech recognition for REAPER"
homepage  = "https://github.com/TeamAudio/reaspeech/"
license = "https://github.com/TeamAudio/reaspeech/blob/main/LICENSE"
authors = [
    "Dave Benjamin",
    "Mike DeFreitas",
    "Roel Sanchez",
]
readme = "README.md"
packages = [{ include = "app" }]

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu"
priority = "explicit"

[tool.poetry.dependencies]
python = "^3.11"
unidecode = "^1.3.8"
uvicorn = { extras = ["standard"], version = "^0.32.1" }
gunicorn = "^23.0.0"
tqdm = "^4.67.0"
python-multipart = "^0.0.17"
ffmpeg-python = "^0.2.0"
fastapi = "^0.115.5"
llvmlite = "^0.44.0"
numba = "^0.61.0"
openai-whisper = "20240930"
faster-whisper = "^1.1.0"
torch = "^2.6.0"
jinja2 = "^3.1.4"
celery = "^5.4.0"
ctranslate2 = "4.3.1"
aiofiles = "^24.1.0"
sqlalchemy = "^2.0.36"
pywhispercpp = { git = "https://github.com/absadiki/pywhispercpp" }

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
You need to download the CoreML version of the "base" model and place it in the "~/.cache/whisper/" folder. You'll see an error when you run whisper that the .mlmodelc file is missing if you don't do this. There should be a ggml-base.en.bin in there too, after you first run reaspeech with whisper.cpp.

Here's the file: https://huggingface.co/ggerganov/whi...r.mlmodelc.zip

You need to set the WHISPER_COREML environment variable when running the poetry install:

Code:
WHISPER_COREML=1 poetry install
You need to update the app/whisper_cpp/core.py transcribe function to work with bytes instead of strings, like so:

Code:
def transcribe(audio, asr_options, output):
    options_dict = build_options(asr_options)
    logger.info(f"whisper.cpp options: {options_dict}")

    audio_duration = len(audio) / SAMPLE_RATE

    with model_lock:
        segments = []
        text = ""
        with tqdm.tqdm(total=audio_duration, unit='sec') as tqdm_pbar:
            def new_segment_callback(segment):
                segment_start = float(segment.t0) / 100.0
                segment_end = float(segment.t1) / 100.0
                tqdm_pbar.update(segment_end - segment_start)
            options_dict['new_segment_callback'] = new_segment_callback

            for segment in model.transcribe(audio, **options_dict):
                # Ensure segment.text is decoded to a string
                if isinstance(segment.text, bytes):
                    decoded_text = segment.text.decode("utf-8")
                    #logger.info(f"decoded_text: {decoded_text}")
                else:
                    decoded_text = segment.text

                segment_dict = {
                    "start": float(segment.t0) / 100.0,
                    "end": float(segment.t1) / 100.0,
                    "text": decoded_text,   # use the decoded text here
                    "words": []
                }
                for word in segment.words:
                    # Decode word.text if needed
                    if isinstance(word.text, bytes):
                        word_text = word.text.decode("utf-8")
                    else:
                        word_text = word.text
                    word_dict = {
                        "start": float(word.t0) / 100.0,
                        "end": float(word.t1) / 100.0,
                        "word": word_text,
                        "probability": word.p
                    }
                    segment_dict["words"].append(word_dict)

                segments.append(segment_dict)
                text += decoded_text + " "

        result = {
            "language": options_dict.get("language"),
            "segments": segments,
            "text": text.strip()
        }

    output_file = StringIO()
    write_result(result, output_file, output)
    output_file.seek(0)

    return output_file
I think that's everything. Then just run it like so (note it's python3.11):

Code:
ASR_ENGINE=whisper_cpp poetry run python3.11 app/run.py --build-reascripts
atmosfar is online now   Reply With Quote
Old 03-27-2025, 07:50 AM   #36
atmosfar
Human being with feelings
 
Join Date: May 2020
Posts: 106
Default

For anyone who has added take markers to their items and then closed the Reaspeech window, you can still search the "text" by going to View > Region / Marker Manager and ticking the Take markers box.

Also! If you're editing a podcast, I recommend doing the transcription before you make any cuts. Because the script will add a lot of duplicate take markers if you have multiple items sourced from the same file. You can end up with a project file that's thousands of lines long.
atmosfar is online now   Reply With Quote
Old 08-07-2025, 03:09 AM   #37
BogdanS
Human being with feelings
 
Join Date: Aug 2013
Location: Ukraine
Posts: 106
Default

after update ReaImGui it stopped working
reaper.ImGui_CreateFont': expected 2 arguments maximum
BogdanS is offline   Reply With Quote
Old 08-07-2025, 06:49 AM   #38
cfillion
Human being with feelings
 
cfillion's Avatar
 
Join Date: May 2015
Location: Québec, Canada
Posts: 5,583
Default

Quote:
Originally Posted by BogdanS View Post
after update ReaImGui it stopped working
reaper.ImGui_CreateFont': expected 2 arguments maximum
This means the script does not enable backward compatibility. Insert this at the top to fix (with "0.9.3" being the ReaImGui API version targeted by the script):

Code:
package.path = reaper.ImGui_GetBuiltinPath() .. '/?.lua'
local ImGui = require 'imgui' '0.9.3'
cfillion is offline   Reply With Quote
Old 08-07-2025, 08:13 AM   #39
mikeylove
Human being with feelings
 
Join Date: Sep 2024
Posts: 6
Default

Quote:
Originally Posted by BogdanS View Post
after update ReaImGui it stopped working
reaper.ImGui_CreateFont': expected 2 arguments maximum
Thanks for the bug report, we're gonna look into implementing cfillion's compatibility layer that he just mentioned.

Having said that though, I'm _much_ more excited to get our code working with the new release generally. Been on the edges of our seats with excitement for these new features! 🤩🤩🤩🤩🤩🤩
mikeylove is offline   Reply With Quote
Old 08-07-2025, 06:56 PM   #40
mikeylove
Human being with feelings
 
Join Date: Sep 2024
Posts: 6
Default

Quote:
Originally Posted by BogdanS View Post
after update ReaImGui it stopped working
reaper.ImGui_CreateFont': expected 2 arguments maximum
Fix is in! https://techaud.io/blog/20250807-rea...v070-released/

We decided to go all-in on 0.10, as the extra language support is a big and worthwhile milestone (in our opinions, anyway).
mikeylove is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 03:49 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.