ldc_bpcsad.decode
.decode
- ldc_bpcsad.decode.decode(x, sr, min_speech_dur=0.5, min_nonspeech_dur=0.3, min_chunk_dur=10, max_chunk_dur=3600, speech_scale_factor=1, silent=True)[source]
Perform speech activity detection an audio signal.
Because HTK’s
HVite
command sometimes fails for longer recordings, we first split x into chunks of at most max_chunk_dur seconds, segment each chunk separately, then merge the results. The individual chunks are segmented using a recursive approach that callsHVite
with progressively smaller chunks until a minimum chunk duration (min_chunk_dur) is reached.- Parameters:
x (numpy.ndarray (n_samples)) – Audio samples.
sr (int) – Sample rate (Hz).
min_speech_dur (float, optional) – Minimum duration of speech segments in seconds. (Default: 0.500)
min_nonspeech_dur (float, optional) – Minimum duration of nonspeech segments in seconds. (Default: 0.300)
min_chunk_dur (float, optional) – Minimum duration in seconds of chunk SAD may be performed on when splitting long recordings. (Default: 10.0)
max_chunk_dur (float, optional) – Maximum duration in seconds of chunk SAD may be performed on when splitting long recordings. (Default: 3600.0)
speech_scale_factor (float, optional) – Factor by which speech model acoustic likelihoods are scaled prior to beam search. Larger values will bias the SAD engine in favour of more speech segments. (Default: 1)
silent (bool, optional) – If True, suppress all logging messages. (Default: True)
- Returns:
segs – Detected speech segments.
- Return type:
List[Segment]
- Raises: