Speaker Verification
Note: The modelscope pipeline supports all the models in model zoo to inference and finetine. Here we take the model of xvector_sv as example to demonstrate the usage.
Inference with pipeline
Quick start
Speaker verification
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
inference_sv_pipline = pipeline(
task=Tasks.speaker_verification,
model='damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch'
)
# The same speaker
rec_result = inference_sv_pipline(audio_in=(
'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav',
'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_same.wav'))
print("Similarity", rec_result["scores"])
# Different speakers
rec_result = inference_sv_pipline(audio_in=(
'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav',
'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_different.wav'))
print("Similarity", rec_result["scores"])
Speaker embedding extraction
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
# Define extraction pipeline
inference_sv_pipline = pipeline(
task=Tasks.speaker_verification,
model='damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch'
)
# Extract speaker embedding
rec_result = inference_sv_pipline(
audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav')
speaker_embedding = rec_result["spk_embedding"]
Full code of demo, please ref to infer.py.
API-reference
Define pipeline
task
:Tasks.speaker_verification
model
: model name in model zoo, or model path in local diskngpu
:1
(Default), decoding on GPU. If ngpu=0, decoding on CPUoutput_dir
:None
(Default), the output path of results if setbatch_size
:1
(Default), batch size when decodingsv_threshold
:0.9465
(Default), the similarity threshold to determine whether utterances belong to the same speaker (it should be in (0, 1))
Infer pipeline for speaker embedding extraction
audio_in
: the input to process, which could be:url (str):
e.g.
: https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wavlocal_path:
e.g.
: path/to/a.wavwav.scp:
e.g.
: path/to/wav1.scpwav.scp test1 path/to/enroll1.wav test2 path/to/enroll2.wav
bytes:
e.g.
: raw bytes data from a microphonefbank1.scp,speech,kaldi_ark:
e.g.
: extracted 80-dimensional fbank features with kaldi toolkits.
Infer pipeline for speaker verification
audio_in
: the input to process, which could be:Tuple(url1, url2):
e.g.
: (https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav, https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_different.wav)Tuple(local_path1, local_path2):
e.g.
: (path/to/a.wav, path/to/b.wav)Tuple(wav1.scp, wav2.scp):
e.g.
: (path/to/wav1.scp, path/to/wav2.scp)wav1.scp test1 path/to/enroll1.wav test2 path/to/enroll2.wav wav2.scp test1 path/to/same1.wav test2 path/to/diff2.wav
Tuple(bytes, bytes):
e.g.
: raw bytes data from a microphoneTuple(“fbank1.scp,speech,kaldi_ark”, “fbank2.scp,speech,kaldi_ark”):
e.g.
: extracted 80-dimensional fbank features with kaldi toolkits.
Inference with you data
Use wav1.scp or fbank.scp to organize your own data to extract speaker embeddings or perform speaker verification.
In this case, the output_dir
should be set to save all the embeddings or scores.
Inference with multi-threads on CPU
You can inference with multi-threads on CPU as follow steps:
Set
ngpu=0
while defining the pipeline ininfer.py
.Split wav.scp to several files
e.g.: 4
split -l $((`wc -l < wav.scp`/4+1)) --numeric-suffixes wav.scp splits/wav.scp.
Start to extract embeddings
for wav_scp in `ls splits/wav.scp.*`; do
infer.py ${wav_scp} outputs/$((basename ${wav_scp}))
done
The embeddings will be saved in
outputs/*
Inference with multi GPU
Similar to inference on CPU, the difference are as follows:
Step 1. Set ngpu=1
while defining the pipeline in infer.py
.
Step 3. specify the gpu device with CUDA_VISIBLE_DEVICES
:
for wav_scp in `ls splits/wav.scp.*`; do
CUDA_VISIBLE_DEVICES=1 infer.py ${wav_scp} outputs/$((basename ${wav_scp}))
done