(简体中文|English)
Pretrained Models Released on ModelScope
Model License
You are free to use, copy, modify, and share FunASR models under the conditions of this agreement. You should indicate the model source and author information when using, copying, modifying and sharing FunASR models. You should keep the relevant names of models in [FunASR software].. Full model license could see license
Model Zoo
Here we provided several pretrained models on different datasets. The details of models and datasets can be found on ModelScope.
Speech Recognition
Paraformer
Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
---|---|---|---|---|---|---|
Paraformer-large | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Duration of input wav <= 20s |
Paraformer-large-long | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which would deal with arbitrary length input wav |
Paraformer-large-en-long | EN | Alibaba Speech Data (50000hours) | 10020 | 220M | Offline | Which would deal with arbitrary length input wav |
Paraformer-large-Spk | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Supporting speaker diarizatioin for ASR results based on paraformer-large-long |
Paraformer-large-contextual | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. |
Paraformer | CN & EN | Alibaba Speech Data (50000hours) | 8358 | 68M | Offline | Duration of input wav <= 20s |
Paraformer-online | CN & EN | Alibaba Speech Data (50000hours) | 8404 | 68M | Online | Which could deal with streaming input |
Paraformer-large-online | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Online | Which could deal with streaming input |
Paraformer-tiny | CN | Alibaba Speech Data (200hours) | 544 | 5.2M | Offline | Lightweight Paraformer model which supports Mandarin command words recognition |
Paraformer-aishell | CN | AISHELL (178hours) | 4234 | 43M | Offline | |
ParaformerBert-aishell | CN | AISHELL (178hours) | 4234 | 43M | Offline | |
Paraformer-aishell2 | CN | AISHELL-2 (1000hours) | 5212 | 64M | Offline | |
ParaformerBert-aishell2 | CN | AISHELL-2 (1000hours) | 5212 | 64M | Offline |
UniASR [Unify Streaming and Non-streaming]
Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
---|---|---|---|---|---|---|
UniASR | CN & EN | Alibaba Speech Data (60000 hours) | 8358 | 100M | Online | UniASR streaming offline unifying models |
UniASR-large | CN & EN | Alibaba Speech Data (60000 hours) | 8358 | 220M | Offline | UniASR streaming offline unifying models |
UniASR English | EN | Alibaba Speech Data (10000 hours) | 1080 | 95M | Online | UniASR streaming online unifying models |
UniASR Russian | RU | Alibaba Speech Data (5000 hours) | 1664 | 95M | Online | UniASR streaming online unifying models |
UniASR Japanese | JA | Alibaba Speech Data (5000 hours) | 5977 | 95M | Online | UniASR streaming offline unifying models |
UniASR Korean | KO | Alibaba Speech Data (2000 hours) | 6400 | 95M | Online | UniASR streaming online unifying models |
UniASR Cantonese (CHS) | Cantonese (CHS) | Alibaba Speech Data (5000 hours) | 1468 | 95M | Online | UniASR streaming online unifying models |
UniASR Indonesian | ID | Alibaba Speech Data (1000 hours) | 1067 | 95M | Online | UniASR streaming offline unifying models |
UniASR Vietnamese | VI | Alibaba Speech Data (1000 hours) | 1001 | 95M | Online | UniASR streaming offline unifying models |
UniASR Spanish | ES | Alibaba Speech Data (1000 hours) | 3445 | 95M | Online | UniASR streaming online unifying models |
UniASR Portuguese | PT | Alibaba Speech Data (1000 hours) | 1617 | 95M | Online | UniASR streaming offline unifying models |
UniASR French | FR | Alibaba Speech Data (1000 hours) | 3472 | 95M | Online | UniASR streaming online unifying models |
UniASR German | GE | Alibaba Speech Data (1000 hours) | 3690 | 95M | Online | UniASR streaming online unifying models |
UniASR Persian | FA | Alibaba Speech Data (1000 hours) | 1257 | 95M | Online | UniASR streaming offline unifying models |
UniASR Burmese | MY | Alibaba Speech Data (1000 hours) | 696 | 95M | Online | UniASR streaming offline unifying models |
UniASR Hebrew | HE | Alibaba Speech Data (1000 hours) | 1085 | 95M | Online | UniASR streaming offline unifying models |
UniASR Urdu | UR | Alibaba Speech Data (1000 hours) | 877 | 95M | Online | UniASR streaming offline unifying models |
UniASR Turkish | TR | Alibaba Speech Data (1000 hours) | 1582 | 95M | Online | UniASR streaming offline unifying models |
Conformer
Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
---|---|---|---|---|---|---|
Conformer | CN | AISHELL (178hours) | 4234 | 44M | Offline | Duration of input wav <= 20s |
Conformer | CN | AISHELL-2 (1000hours) | 5212 | 44M | Offline | Duration of input wav <= 20s |
Conformer | EN | Alibaba Speech Data (10000hours) | 4199 | 220M | Offline | Duration of input wav <= 20s |
Multi-talker Speech Recognition
Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
---|---|---|---|---|---|---|
MFCCA | CN | AliMeeting、AISHELL-4、Simudata (917hours) | 4950 | 45M | Offline | Duration of input wav <= 20s, channel of input wav <= 8 channel |
Voice Activity Detection
Model Name | Training Data | Parameters | Sampling Rate | Notes |
---|---|---|---|---|
FSMN-VAD | Alibaba Speech Data (5000hours) | 0.4M | 16000 | |
FSMN-VAD | Alibaba Speech Data (5000hours) | 0.4M | 8000 |
Punctuation Restoration
Model Name | Language | Training Data | Parameters | Vocab Size | Offline/Online | Notes |
---|---|---|---|---|---|---|
CT-Transformer-Large | CN & EN | Alibaba Text Data(100M) | 1.1G | 471067 | Offline | large offline punctuation model |
CT-Transformer | CN & EN | Alibaba Text Data(70M) | 291M | 272727 | Offline | offline punctuation model |
CT-Transformer-Realtime | CN & EN | Alibaba Text Data(70M) | 288M | 272727 | Online | online punctuation model |
Language Models
Model Name | Training Data | Parameters | Vocab Size | Notes |
---|---|---|---|---|
Transformer | Alibaba Speech Data (?hours) | 57M | 8404 |
Speaker Verification
Model Name | Training Data | Parameters | Number Speaker | Notes |
---|---|---|---|---|
Xvector | CNCeleb (1,200 hours) | 17.5M | 3465 | Xvector, speaker verification, Chinese |
Xvector | CallHome (60 hours) | 61M | 6135 | Xvector, speaker verification, English |
Speaker Diarization
Model Name | Training Data | Parameters | Notes |
---|---|---|---|
SOND | AliMeeting (120 hours) | 40.5M | Speaker diarization, profiles and records, Chinese |
SOND | CallHome (60 hours) | 12M | Speaker diarization, profiles and records, English |
Timestamp Prediction
| Model Name | Language | Training Data | Parameters | Notes | |:————————————————————————————————–:|:————–:|:——————-:|:———-:|:——| | TP-Aligner | CN | Alibaba Speech Data (50000hours) | 37.8M | Timestamp prediction, Mandarin, middle size |
Inverse Text Normalization (ITN)
Model Name | Language | Parameters | Notes |
---|---|---|---|
English | EN | 1.54M | ITN, ASR post-processing |
Russian | RU | 17.79M | ITN, ASR post-processing |
Japanese | JA | 6.8M | ITN, ASR post-processing |
Korean | KO | 1.28M | ITN, ASR post-processing |
Indonesian | ID | 2.06M | ITN, ASR post-processing |
Vietnamese | VI | 0.92M | ITN, ASR post-processing |
Tagalog | TL | 0.65M | ITN, ASR post-processing |
Spanish | ES | 1.32M | ITN, ASR post-processing |
Portuguese | PT | 1.28M | ITN, ASR post-processing |
French | FR | 4.39M | ITN, ASR post-processing |
German | GE | 3.95M | ITN, ASR post-processing |