Installation

Warning

Currently, this software will NOT run on computers using Apple Silicon. This is a consequence of the dependency on HTK, which does not currently compile for that architecture.

Installation from source

Download HTK

The SAD engine depends on HTK for feature extraction and decoding. Unfortunately, the terms of the HTK license do not allow us to distribute the HTK source code, so the user must download it manually:

Install build dependencies

Run:

sudo apt-get install gcc-multilib make patch libsndfile1

Create a new virtual environment

We recommend installing ldc-bpc into a fresh Python virtual environment:

virtualenv sad-venv
source sad-venv/bin/activate

Or, if you have multiple versions of Python and wish to use a specific one – e.g., Python 3.8:

virtualenv --python=python3.8 sad-venv
source sad-venv/bin/activate

To learn more about virtual environments, please consult this tutorial.

Clone the repo

git clone https://github.com/Linguistic-Data-Consortium/ldc-bpcsad.git
cd ldc-bpcsad/

Build HTK

Once you have sucessfully downloaded HTK, run the included installation script to build and install the command line tools:

sudo ./tools/install_htk.sh /path/to/HTK-3.4.1.tar.gz

You will be prompted for your administrative password, following which the HTK command line tools will be compiled and installed to /usr/local/bin. If the installation is successful, you will see the following printed in your terminal at the bottom of the logging output:

./install_htk.sh: Successfully installed HTK. To use, make sure the following directory is on your PATH:
./install_htk.sh:
./install_htk.sh:     /usr/local/bin

If you wish to install the tools to a different location (e.g., because you do not have administrative privileges), specify the alternate location using the --prefix flag; e.g.:

./tools/install_htk.sh --prefix /opt /path/to/HTK-3.4.1.tar.gz

which would install the command line tools to /opt/bin. Then add this directory to your PATH:

echo 'export PATH=/opt/bin:${PATH}' >> ~/.bashrc

Install ldc-bpcsad

To install into the current virtual environment using pip:

pip install .

Installation via Docker

ldc-bpcsad can also be intstalled and run using Docker.

Install Docker

Install Docker according to the instructions for your platform:

Build image

Build a Docker image that containers can be run on:

  • clone the ldc-bpcsad repo

    git clone https://github.com/Linguistic-Data-Consortium/ldc-bpcsad.git
    
  • Download HTK following the instructions above and copy the tarball to ldc-bpcsad/src; e.g.,

    cp ~/Downloads/HTK-3.4.1.tar.gz ldc-bpcsad/src
    
  • Run docker build:

    cd ldc-bpcsad
    docker build -t ldc-bpcsad .
    

Run SAD in a container

To run ldc-bpcsad within a Docker container:

docker run --rm -v /opt/corpora/:/corpora ldc-bpcsad "ldc-bpcsad --output-dir /corpora/sad1 /corpora/LDC2020E12_Third_DIHARD_Challenge_Development_Data/data/flac/DH_DEV_*.flac"

NOTE that the above command runs ldc-bpcsad in a lightly virtualized environment (the container) with its own filesystem. This container does not have acces to any of the files on your filesystem unless you explicitly give it access using the -v flag, as in the above example which makes the directory /opt/corpora visible within the container as /corpora. For more details, consult the Docker volumes documentation.