ldc_bpcsad.decode.decode

ldc_bpcsad.decode.decode(x, sr, min_speech_dur=0.5, min_nonspeech_dur=0.3, min_chunk_dur=10, max_chunk_dur=3600, speech_scale_factor=1, silent=True)[source]

Perform speech activity detection an audio signal.

Because HTK’s HVite command sometimes fails for longer recordings, we first split x into chunks of at most max_chunk_dur seconds, segment each chunk separately, then merge the results. The individual chunks are segmented using a recursive approach that calls HVite with progressively smaller chunks until a minimum chunk duration (min_chunk_dur) is reached.

Parameters:
  • x (numpy.ndarray (n_samples)) – Audio samples.

  • sr (int) – Sample rate (Hz).

  • min_speech_dur (float, optional) – Minimum duration of speech segments in seconds. (Default: 0.500)

  • min_nonspeech_dur (float, optional) – Minimum duration of nonspeech segments in seconds. (Default: 0.300)

  • min_chunk_dur (float, optional) – Minimum duration in seconds of chunk SAD may be performed on when splitting long recordings. (Default: 10.0)

  • max_chunk_dur (float, optional) – Maximum duration in seconds of chunk SAD may be performed on when splitting long recordings. (Default: 3600.0)

  • speech_scale_factor (float, optional) – Factor by which speech model acoustic likelihoods are scaled prior to beam search. Larger values will bias the SAD engine in favour of more speech segments. (Default: 1)

  • silent (bool, optional) – If True, suppress all logging messages. (Default: True)

Returns:

segs – Detected speech segments.

Return type:

List[Segment]

Raises:

DecodingError