BCRMatch - version 0.1-beta =========================== INTRODUCTION ------------ BCRMatch is a tool that accepts sequences of CDR loops of antibodies, and uses the pre-trained machine learning models developed in this study to predict which antibodies recognize the same epitope. Note: Please contact us at help@iedb.org if you wish to use antibody structure, in addition to sequence, for making predictions. PREREQUISITES ------------ + Docker (for running in containerized environment) * https://www.docker.com/ + Python 3.9 or higher * http://www.python.org/ + Required Python packages: * numpy * pandas * scikit-learn * xgboost * tensorflow * torch + Dependency tool: * TCRMatch * https://github.com/IEDB/TCRMatch INSTALLATION ----------- 1. Prebuilt docker image (highly recommended) Pull the image from public registry and tag it locally as bcrmatch: > docker pull harbor.lji.org/iedb-public/bcrmatch:latest > docker tag harbor.lji.org/iedb-public/bcrmatch:latest bcrmatch Run basic example on BCRMatch: > docker run --rm bcrmatch python3 run_bcrmatch.py -i ./examples/example.tsv -tn abpairs_abligity 2. Local installation Install requirements: > pip install -r requirements.txt Set environment variable to TCRMatch path: > export TCRMATCH_PATH=/path/to/tcrmatch_dir Download pre-trained datasets (optional, but recommended): Run the script "dataset-download.sh" to download the most up-to-date pre-trained datasets from the IEDB servers: > sh dataset-download.sh USAGE ----- To perform a prediction, CDRLs and CDRHs are required along with a dataset name. Use help flag to inspect available parameters: > python run_bcrmatch.py --help Running Locally -------------- 1. Run a simple prediction with a TSV file: > python run_bcrmatch.py -i examples/set-a/example.tsv -tn abpairs_abligity 2. Run with individual FASTA files: > python run_bcrmatch.py \ -ch examples/set-a/cdrh1_input.fasta \ examples/set-a/cdrh2_input.fasta \ examples/set-a/cdrh3_input.fasta \ -cl examples/set-a/cdrl1_input.fasta \ examples/set-a/cdrl2_input.fasta \ examples/set-a/cdrl3_input.fasta \ -tn abpairs_abligity 3. Saving the output to a file: > python run_bcrmatch.py -i examples/set-a/example.tsv -tn abpairs_abligity -o output_file.csv 4. List available datasets: > python run_bcrmatch.py --list-datasets Running with Docker ----------------- 1. Run a simple prediction with a TSV file: > docker run --rm bcrmatch bash -c "python3 run_bcrmatch.py -i /src/bcrmatch/examples/set-a/example.tsv -tn abpairs_abligity" 2. Run with individual FASTA files: > docker run --rm bcrmatch bash -c "python3 run_bcrmatch.py \ -ch /src/bcrmatch/examples/set-a/cdrh1_input.fasta \ /src/bcrmatch/examples/set-a/cdrh2_input.fasta \ /src/bcrmatch/examples/set-a/cdrh3_input.fasta \ -cl /src/bcrmatch/examples/set-a/cdrl1_input.fasta \ /src/bcrmatch/examples/set-a/cdrl2_input.fasta \ /src/bcrmatch/examples/set-a/cdrl3_input.fasta \ -tn abpairs_abligity" 3. Saving the output to a file (output_file.csv will be in your current directory): > docker run --rm -v $(pwd):/src/bcrmatch bcrmatch bash -c "python3 run_bcrmatch.py -i /src/bcrmatch/examples/set-a/example.tsv -tn abpairs_abligity -o /src/bcrmatch/output_file.csv" 4. List available datasets: > docker run --rm bcrmatch bash -c "python3 run_bcrmatch.py --list-datasets" TRAINING -------- For detailed information about training custom models, please refer to the documentation in docs/training.md. ANARCI ------ NOTE: ANARCI functionality is only available through Docker containers due to Python package incompatibility issues. Local installation is not supported. To use ANARCI functionality for processing full heavy and light chain sequences: 1. Prebuilt docker image: > docker pull harbor.lji.org/iedb-public/bcrmatch-anarci:latest > docker tag harbor.lji.org/iedb-public/bcrmatch-anarci:latest bcrmatch-anarci 2. Inside the container, run BCRMatch with your full chain sequences: > docker run --rm bcrmatch-anarci python3 run_bcrmatch.py \ -fh examples/set-c/updated_example_vh_seqs.fasta \ -fl examples/set-c/updated_example_vl_seqs.fasta \ -tn abpairs_abligity