***************** Command-line tool ***************** .. contents:: Table of Contents :depth: 2 Basic usage =========== The easiest way to perform speech activity detection (SAD) for a set of audio files is via the :ref:`ldc-bpcsad` command line tool. To perform SAD for channel 1 of each of a set of audio files ``rec1.flac``, ``rec2.flac``, ``rec3.flac``, ... and output their segmentation as HTK label files under the directory ``label_dir``: .. code-block:: console ldc-bpcsad --channel 1 --output-dir label_dir rec1.flac rec2.flac rec3.flac ... This will result in one label file for input file (e.g., ``rec1.lab``, ``rec2.lab``, ...), each of the form: .. code-block:: none 0.00 1.05 nonspeech 1.05 3.55 speech 3.55 4.65 nonspeech . . . Script files ============ It is also possible to specify the audio files and channels to be processed using a script file specified via the ``--scp`` flag. Currently, two script file formats are supported: - ``htk`` -- :ref:`HTK script file` (**default**) - ``json`` -- :ref:`JSON script file` .. _htk_scp: HTK script file --------------- If ``--scp-fmt htk`` is specified, :ref:`ldc-bpcsad` will load the audio files to be segmented from an `HTK `_ script file. An HTK script file consists of a list of file paths, one path per line; e.g.: .. code-block:: none /path/to/rec1.flac /path/to/rec2.flac /path/to/rec3.flac For instance, if ``task.scp`` is the above HTK script file, then: .. code-block:: console ldc-bpcsad --channel 1 --output-dir label_dir --scp-fmt htk --scp task.scp is equivalent to: .. code-block:: console ldc-bpcsad --channel 1 --output-dir label_dir /path/to/rec1.flac /path/to/rec2.flac /path/to/rec3.flac .. _json_scp: JSON script file ---------------- If ``--scp-fmt json`` is specified, :ref:`ldc-bpcsad` will load the audio files **AND** channels to be segmented from a JSON file. The JSON file should consist of a sequence of JSON objects, each containing the following three key-value pairs: - ``audio_path`` -- Path to audio file to perform SAD on. - ``channel`` -- Channel number of audio file to perform SAD on (1-indexed). - ``channel_id`` -- Basename for output file containing SAD result. E.g.: .. code-block:: json [{ "audio_path": "/path/to/rec1.flac", "channel_id": "rec1_c1", "channel": 1 }, { "audio_path": "/path/to/rec1.flac", "channel_id": "rec1_c2", "channel": 2 }, { "audio_path": "/path/to/rec2.flac", "channel_id": "rec2_c1", "channel": 1 }] For instance, if ``task.json`` is the above JSON file, then: .. code-block:: console ldc-bpcsad --output-dir label_dir --scp-fmt json --scp task.json will output the following three HTK label files to ``label_dir``: - ``rec1_c1.lab`` -- result of SAD for channel 1 of ``rec1.flac`` - ``rec1_c2.lab`` -- result of SAD for channel 2 of ``rec1.flac`` - ``rec2_c1.lab`` -- result of SAD for channel 1 of ``rec2.flac`` .. note:: When using a JSON script file, the ``--channel`` flag has no effect. Output formats ============== The output file format for SAD output can be specified via the ``--output-fmt`` flag. Currently, four options are available: - ``htk`` -- :ref:`HTK label file` (**default**) - ``rttm`` -- :ref:`Rich Transcription Time Marked (RTTM) file` - ``audacity`` -- :ref:`Audacity label file` - ``textgrid`` -- :ref:`Praat TextGrid` .. _htk_lab: HTK label file -------------- If ``--output-fmt htk`` is specified, SAD output will be stored as `HTK `_ label files. Each label file contains one segment per line, each line having the form: .. code-block:: none \t\t