#1823 whisper transcription #2165

lfcnassif · 2024-04-12T23:34:28Z

When finished this will close #1823.

Already tested on CPU. I still need to test on GPU, test the remote service and verify Wav2Vec2 backwards compatibility.

lfcnassif · 2024-04-13T19:20:39Z

I think this is finished. @marcus6n, I would appreciate very much if you could test this on Monday, thank you.

marcus6n · 2024-04-13T22:19:24Z

@lfcnassif Yes, I can test it!

marcus6n · 2024-04-15T17:06:15Z

@lfcnassif I've run the tests and everything is working properly.

gfd2020 · 2024-04-15T19:48:18Z

I was waiting for this PR. Thank you. I will test this PR with GPU CUDA.
I also found some audios that had encoding problems in the transcription. I'll test them too.

@lfcnassif , a suggestion.
I had also suggested how to calculate the finalscore in Python with "numpy.average(probs)", but numpy.average is the weighted average, as it has no weighting and is not passed as a parameter, it is the same as numpy.mean. Maybe it's a little faster...

Another thing, does this PR also close issue #1335?

lfcnassif · 2024-04-15T21:29:56Z

I will test this PR with GPU CUDA.
I also found some audios that had encoding problems in the transcription. I'll test them too.

Hi @gfd2020! Additional tests will be very welcome!

@lfcnassif , a suggestion.
I had also suggested how to calculate the finalscore in Python with "numpy.average(probs)", but numpy.average is the weighted average, as it has no weighting and is not passed as a parameter, it is the same as numpy.mean. Maybe it's a little faster...

I took the final score computation from your previous code suggestion, thank you! Good to know, we can replace the function, but I think the time difference will not be noticeable.

Another thing, does this PR also close issue #1335?

No, I'll keep it open, since I didn't finish all my planned tests. I'm integrating this because some users asked for it. Beyond Whisper.cpp which improved a lot in the last months and added full CUDA support, I also found WhisperX (which uses Faster-Whisper under the hood) and Insanely-Faster-Whisper. Those last 2 libs break long audios into 30s parts and executes batch inference on the audio segments simultaneously, resulting in up to 10x speed up because of batching, at the cost of increased GPU memory usage. I did a quick test with them and they are really really fast for long audios indeed! But their approach can decrease the final accuracy, since default Whisper algorithm uses previous transcribed tokens to help transcribing the next ones. AFAIK, those libraries break the audio in parts and the transcription is done independently on the 30s audio segments. As I didn't measure WER for those libraries yet, I'm concerned about integrating them. If they could accept many different audios as input and transcribe them using batch inference instead of breaking the audios, that would be a safer approach. But that would require more work from our side, to group audios with similar duration before transcription, decide waiting or not to group audios, signal last audio, etc.

lfcnassif · 2024-04-15T23:36:06Z

Using float16 precision instead of int8 gave almost a 50% speed up on RTX 3090.

gfd2020 · 2024-04-15T23:39:18Z

Using float16 precision instead of int8 gave almost a 50% speed up on RTX 3090.

On CPU too?

lfcnassif · 2024-04-15T23:58:47Z

On CPU too?

Possibly not, I'll check and report back.

lfcnassif · 2024-04-16T01:18:05Z

@gfd2020 thanks for asking about the effect of float16 on CPU. Actually it doesn't work on CPU at all, just pushed commit fixing it. About float32 x int8 speed on CPU, testing with ~160 audios on 48 threads CPU, medium Whisper model:

float32 took 1287s
int8 took 1134s

lfcnassif · 2024-04-16T03:45:07Z

Speed numbers of other implementations over a single 442s audio using 1 RTX 3090, medium model, float16 precision (except whisper.cpp, it couldn't be defined):

Faster-Whisper took ~36s
Whisper.cpp took ~31s
Insanely-Fast-Whisper took ~7s
WhisperX took ~5s

Running over the 160 real world small audios dataset above (total duration of 2758s):

Faster-Whisper took 220s
Whisper.cpp 185s
Insanely-Fast-Whisper 358s
WhisperX took 171s

PS: Whisper.cpp seems to parallelize better than others using multiple processes, so its last number could be improved.
PS2: For inference on CPU, Whisper.cpp is faster than Faster-Whisper by ~35%, not sure if I will time all of them on CPU...
PS3: Using large-v3 model within Whisper.cpp, it produced hallucinations (repeated texts and a few non existing texts), it was also observed with Faster-Whisper in a lower level.

gfd2020 · 2024-04-16T16:35:15Z

Hi, @lfcnassif

I don't have a very powerful GPU but it has a tensor cores and the following error occurred:
"Requested float16 compute type, but the target device or backend does not support efficient float16 computation."

So I changed it to float32 and it gave the following error:
"CUDA failed with error out of memory"

finally, change to int8 and worked fine on GPU.

So, I have two suggestions:

Print the error message if is change to computing on the CPU.
Leave int8 as the default and use compute type as a parameter on audiotranscripttask.txt

I'm still doing other tests

lfcnassif · 2024-04-16T16:45:14Z

So, I have two suggestions:

Print the error message if is change to computing on the CPU.

Leave int8 as the default and use compute type as a parameter on audiotranscripttask.txt

Thanks for testing @gfd2020! Both are good suggestions and I was already planning to externalize the compute_type (precision) parameter, and also the batch_size if we switch to WhisperX, I'm running accuracy tests and should post the results soon. About the float16 not supported, what is your CUDA Toolkit installed version?

gfd2020 · 2024-04-16T16:57:41Z

Thanks for testing @gfd2020! Both are good suggestions and I was already planning to externalize the compute_type (precision) parameter, and also the batch_size if we switch to WhisperX, I'm running accuracy tests and should post the results soon. About the float16 not supported, what is your CUDA Toolkit installed version?

NVIDIA CUDA 11.7.99 driver on Quadro P620
torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1

This was the only version that I managed to make work on these weaker GPUs (Quadro P620 and T400)

gfd2020 · 2024-04-30T14:37:40Z

That's sad news, I removed FFmpeg from PATH and just reproduced it. Yesterday I tested both faster-whisper and whisperX into a VM with a fresh Windows 10 install and strangely that error didn't happen, the only dependency needed was the MS Visual C++ Redistributable 2015-2019 package.

Couldn't you put ffmpeg.exe in the iped tools? Is the problem putting it in the path?

Those warnings also happen here. But all tests done on #1335 were with those warnings present.

Ok, just to let you know about them.

lfcnassif · 2024-04-30T14:47:28Z

Couldn't you put ffmpeg.exe in the iped tools? Is the problem putting it in the path?

It's possible, but on #1267 @wladimirleite did a good job to remove ffmpeg as dependency, since we already use mplayer for video related stuff...

Ok, just to let you know about them.

Thanks!

lfcnassif · 2024-04-30T17:15:03Z

Yesterday I tested both faster-whisper and whisperX into a VM with a fresh Windows 10 install and strangely that error didn't happen

My fault, I tested again into the VM and WhisperX returns error without FFmpeg. I just added an explicit check and better error message to the user if it is not found.

gfd2020 · 2024-04-30T18:44:38Z

My fault, I tested again into the VM and WhisperX returns error without FFmpeg. I just added an explicit check and better error message to the user if it is not found.

Is there no way to modify the Python code to search for ffmpeg in a relative path within iped?

lfcnassif · 2024-04-30T19:05:46Z

Is there no way to modify the Python code to search for ffmpeg in a relative path within iped?

We can set the PATH env var of the main IPED process from the start up process and point to an embedded ffmpeg. But I'm not sure if we should embed ffmpeg and actually I'm thinking about offering both faster-whipser and whisperx as suggested by @rafael844 because faster-whisper doesn't have ffmpeg dependency and whisperx has many dependencies that may cause conflicts with other modules (now or in the future).

gfd2020 · 2024-04-30T20:47:38Z

Can I do a small step by step guide to install the requirements on the GPU?
I did some tests here and everything worked.

I had to make some modifications to the code to be able to use it in an environment without an internet connection and point to the local model.

So the modelName parameter accepts the model name, relative path ( iped folder) and absolute path.

Examples:
whisperModel = medium
whisperModel = models/my_model
whisperModel = C:/my_model

try:
    import os
    localModel = False
    localPath = os.path.join(os.getcwd(), modelName)
    if os.path.exists(modelName) and os.path.isabs(modelName):
        localModel = True
        localPath = modelName
    elif os.path.exists(localPath): 
        localModel = True        
    if localModel:
        import torch
        from whisperx.vad import load_vad_model            
        model_fp = os.path.join(localPath, "whisperx-vad-segmentation.bin")
        vad_model = load_vad_model(torch.device(deviceNum), vad_onset=0.500, vad_offset=0.363, use_auth_token=None, model_fp=model_fp)
        model = whisperx.load_model(localPath, device=deviceId, device_index=deviceNum, threads=threads, compute_type=compute_type, language=language, vad_model=vad_model)
    else:
        model = whisperx.load_model(modelName, device=deviceId, device_index=deviceNum, threads=threads, compute_type=compute_type, language=language)

lfcnassif · 2024-04-30T20:59:38Z

Can I do a small step by step guide to install the requirements on the GPU?

If it is independent of user environment or hardware, for sure! The wiki is publicly editable.

Maybe above code won't work if IPED is executed outside from its folder. For that, we use System.getProperty('iped.root') to get IPED's root folder.

lfcnassif · 2024-04-30T21:38:39Z

Without code above, does it need to be always connected to the Internet or just in the first run to download models?

gfd2020 · 2024-04-30T22:42:20Z

Can I do a small step by step guide to install the requirements on the GPU?

If it is independent of user environment or hardware, for sure! The wiki is publicly editable.

Windows only, any graphics card.

Maybe above code won't work if IPED is executed outside from its folder. For that, we use System.getProperty('iped.root') to get IPED's root folder.

Thanks. I'll try.

Without code above, does it need to be always connected to the Internet or just in the first run to download models?

Just the first run. But my idea is to create a customized IPED package with the models. This way, you would just install this package without internet access.

lfcnassif · 2024-05-01T02:10:59Z

Windows only, any graphics card.

That would be totally enough, thank you @gfd2020 for trying to improve the manual!

lfcnassif · 2024-05-01T02:17:43Z

@gfd2020, out of curiosity, have you played with the batchSize parameter? I know your GPUs are quite old and have a limited amount of memory, but I wonder if you got some speed up with it.

gfd2020 · 2024-05-01T13:29:03Z

@gfd2020, out of curiosity, have you played with the batchSize parameter? I know your GPUs are quite old and have a limited amount of memory, but I wonder if you got some speed up with it.

Not yet. thanks for reminding me

gfd2020 · 2024-05-02T18:11:07Z

hi @lfcnassif , I did some tests with the batchSize values. Regarding the speedup, I didn't notice a big difference but I have to test it with a larger case with several audios. Then I do these tests.
Pytorch Version: 2.3.0+cu121
CUDA driver version: 12.4.89

Offboard Card - NVIDIA Quadro P620 - 2GB VRAM
int8
batchsize = 3 ( does not work with values greater than this )

float16
batchsize = 1 ( does not work at all )

float32
batchsize = 2 ( does not work with values greater than this )
############################
Offboard Card - NVIDIA T400- 4GB VRAM
int8
batchsize = 25 ( It works up to this value but I didn't go beyond that )

float16
batchsize = 25 ( It works up to this value but I didn't go beyond that )

float32
batchsize = 20 ( does not work with values greater than this )

gfd2020 · 2024-05-02T18:26:03Z

Maybe above code won't work if IPED is executed outside from its folder. For that, we use System.getProperty('iped.root') to get IPED's root folder.

Just try the code and does not work. "No module named 'java'"

from java.lang import System
ipedRoot = System.getProperty('iped.root')

@lfcnassif , Something I didn't do right?

lfcnassif · 2024-05-02T18:37:24Z

hi @lfcnassif , I did some tests with the batchSize values. Regarding the speedup, I didn't notice a big difference but I have to test it with a larger case with several audios. Then I do these tests.
Pytorch Version: 2.3.0+cu121
CUDA driver version: 12.4.89

It should make a difference just with audios longer than 30s, the longer the better.

from java.lang import System ipedRoot = System.getProperty('iped.root')

@lfcnassif , Something I didn't do right?

Sorry, my mistake, that works just into python tasks, the current python code runs as a separate independent python process, it won't see java classes or objects.

gfd2020 · 2024-05-02T18:58:26Z

About the wiki part below:

cd IPED_ROOT/python
python get-pip.py
set PATH=%PATH%;IPED_ROOT_ABSOLUTE_PATH/python/Scripts
cd Scripts

I did it a little differently, so I didn't need to set the path or interfere with another installed python:
Some warnings may appear that Python is not in the path, but it works normally.

Go to stand alone Iped python folder and install packages ( example ):
cd c:\iped-4.2\python
c:\iped-4.2\python>.\python.exe get-pip.py
c:\iped-4.2\python>.\Scripts\pip.exe install numpy
c:\iped-4.2\python>.\Scripts\pip.exe install whisperx
etc

@lfcnassif , what do you think?

lfcnassif · 2024-05-02T19:05:18Z

About the wiki part below:

cd IPED_ROOT/python python get-pip.py set PATH=%PATH%;IPED_ROOT_ABSOLUTE_PATH/python/Scripts cd Scripts

I did it a little differently, so I didn't need to set the path or interfere with another installed python: Some warnings may appear that Python is not in the path, but it works normally.

Go to stand alone Iped python folder and install packages ( example ): cd c:\iped-4.2\python c:\iped-4.2\python>.\python.exe get-pip.py c:\iped-4.2\python>.\Scripts\pip.exe install numpy c:\iped-4.2\python>.\Scripts\pip.exe install whisperx etc

@lfcnassif , what do you think?

It's better! I also thought to change it in the past exactly to avoid mixing with an env installed python, those warnings never brought issues to me too.

lfcnassif · 2024-05-08T23:08:32Z

@wladimirleite, what do you think about embedding ffmpeg? In the long run, we should stay with WhisperX, since we should be able to paralellize small audios transcription on the GPU with an improved version of it.

wladimirleite · 2024-05-08T23:45:25Z

@wladimirleite, what do you think about embedding ffmpeg? In the long run, we should stay with WhisperX, since we should be able to parallelize small audios transcription on the GPU with an improved version of it.

I think it is perfectly fine!
As far as I remember, I suggested removing it because it was being used for something that could be achieved with MPlayer, which we already use. So it would be possible to avoid another dependency. But in the present case FFmpeg is used directly by WhisperX, so it is better to include it than requiring additional installation steps.

lfcnassif · 2024-05-09T00:45:58Z

We were using it to break wav audios on 60s boundaries, it was not possible with mplayer, but you came up with a 100% java solution for that usage.

wladimirleite · 2024-05-09T01:18:33Z

We were using it to break wav audios on 60s boundaries, it was not possible with mplayer, but you came up with a 100% java solution for that usage.

You are right, I completely forgot about that :-)

lfcnassif · 2024-05-25T21:32:34Z

Just pushed changes to support both whisperx and faster_whisper as @rafael844 suggested. Most users won't benefit from whisperx since it needs a GPU with good VRAM to speed up transcribing long audios. For CPU users, faster_whisper is enough, it doesn't need FFmpeg and it is much smaller.

Thanks @gfd2020 and @marcus6n for testing this! If you find any issues with my last commits, please let me know.

lfcnassif added 12 commits April 12, 2024 20:23

'#1823: new audio transcription params for Whisper, rename old ones

a22017a

'#1823: load new transcription parameters

8663142

'#1823: rename RemoteWav2Vec2TranscriptTask to RemoteAudioTranscriptTask

3095e66

'#1823: make private methods protected, make inner class package visible

3e4ba41

'#1823: new Whisper process python service

20aeff9

'#1823: new WhisperTranscriptTask communicating with the python process

d2e1d70

'#1823: fix a typo

147cdf1

'#1823: convert UI language to whisper supported language format

71e125a

'#1823: allow language auto detection configuration

2d0332f

'#1823: uses a much smaller dependency to get number of GPUs

7177a91

'#1823: rename remote transcript classes to be implementation decoupled

71df157

'#1823: makes remote transcription load implementation class from config

53f80a6

lfcnassif marked this pull request as ready for review April 13, 2024 19:19

'#1823: update config file comments

cc7b495

lfcnassif added 2 commits April 15, 2024 20:29

'#1823: use float16, not int8, for better precision and ~50% more speed

b6ec69d

'#1823: use numpy.mean instead of numpy.average (by @gfd2020)

c559910

'#1823: fix commit b6ec69d: uses float16 just for gpu, int8 for cpu

06cc625

'#1823: add a better error message if FFmpeg is not found on PATH

4737172

Merge branch 'master' into #1823_whisper_transcription

1e1c8d2

lfcnassif added 4 commits May 25, 2024 18:15

'#1823: support both whisperx and faster_whisper, try whisperx first

25654b8

'#1823: update error message about missing libraries

a60330a

'#1823: update config files comments

067fc8f

'#1823: log warning instead of aborting if FFmpeg in not on PATH

f8b3f5f

lfcnassif merged commit 982622c into master May 25, 2024
2 checks passed

lfcnassif deleted the #1823_whisper_transcription branch May 25, 2024 21:34

#1823 whisper transcription #2165

#1823 whisper transcription #2165

Conversation

lfcnassif commented Apr 12, 2024 • edited

lfcnassif commented Apr 13, 2024

marcus6n commented Apr 13, 2024

marcus6n commented Apr 15, 2024

gfd2020 commented Apr 15, 2024

lfcnassif commented Apr 15, 2024 • edited

lfcnassif commented Apr 15, 2024

gfd2020 commented Apr 15, 2024

lfcnassif commented Apr 15, 2024

lfcnassif commented Apr 16, 2024

lfcnassif commented Apr 16, 2024 • edited

gfd2020 commented Apr 16, 2024

lfcnassif commented Apr 16, 2024

gfd2020 commented Apr 16, 2024 • edited

gfd2020 commented Apr 30, 2024

lfcnassif commented Apr 30, 2024

lfcnassif commented Apr 30, 2024

gfd2020 commented Apr 30, 2024

lfcnassif commented Apr 30, 2024 • edited

gfd2020 commented Apr 30, 2024

lfcnassif commented Apr 30, 2024

lfcnassif commented Apr 30, 2024

gfd2020 commented Apr 30, 2024

lfcnassif commented May 1, 2024

lfcnassif commented May 1, 2024 • edited

gfd2020 commented May 1, 2024

gfd2020 commented May 2, 2024 • edited

Offboard Card - NVIDIA Quadro P620 - 2GB VRAM int8 batchsize = 3 ( does not work with values ​​greater than this )

float16 batchsize = 1 ( does not work at all )

float32 batchsize = 2 ( does not work with values ​​greater than this ) ############################ Offboard Card - NVIDIA T400- 4GB VRAM int8 batchsize = 25 ( It works up to this value but I didn't go beyond that )

float16 batchsize = 25 ( It works up to this value but I didn't go beyond that )

float32 batchsize = 20 ( does not work with values ​​greater than this )

gfd2020 commented May 2, 2024

lfcnassif commented May 2, 2024

gfd2020 commented May 2, 2024

lfcnassif commented May 2, 2024

lfcnassif commented May 8, 2024

wladimirleite commented May 8, 2024

lfcnassif commented May 9, 2024

wladimirleite commented May 9, 2024

lfcnassif commented May 25, 2024

lfcnassif commented Apr 12, 2024 •

edited

lfcnassif commented Apr 15, 2024 •

edited

lfcnassif commented Apr 16, 2024 •

edited

gfd2020 commented Apr 16, 2024 •

edited

lfcnassif commented Apr 30, 2024 •

edited

lfcnassif commented May 1, 2024 •

edited

gfd2020 commented May 2, 2024 •

edited

Offboard Card - NVIDIA Quadro P620 - 2GB VRAM
int8
batchsize = 3 ( does not work with values greater than this )

float16
batchsize = 1 ( does not work at all )

float32
batchsize = 2 ( does not work with values greater than this )
############################
Offboard Card - NVIDIA T400- 4GB VRAM
int8
batchsize = 25 ( It works up to this value but I didn't go beyond that )

float16
batchsize = 25 ( It works up to this value but I didn't go beyond that )

float32
batchsize = 20 ( does not work with values greater than this )