IEDB Next-Generation Tools Peptide Variant Comparison - version 0.1 beta =======================================================================+ Introduction ------------ This package is a wrapper around T Cell Class I tool and will run predictors from that tool (binding, elution, immunogenicity) on a set of paired peptides and compare the results. Release Notes ------------- v0.1 beta - Initial public beta release Prerequisites ------------- The following prerequisites must be met before installing the tools: + Linux 64-bit environment * http://www.ubuntu.com/ - This distribution has been tested on Linux/Ubuntu 64 bit system. + Python 3.8 or higher * http://www.python.org/ + tcsh * http://www.tcsh.org/Welcome - Under ubuntu: sudo apt-get install tcsh + gawk * http://www.gnu.org/software/gawk/ - Under ubuntu: sudo apt-get install gawk + T Cell Class I Tool * https://nextgen-tools-dev.iedb.org/download-all - The T Cell Class I Tool must be installed and configured in advance. Optional: + Docker Docker Engine is required for running MHC-NP and MHCflurry * https://docs.docker.com/engine/install/ Installation ------------ Below, we will use the example of installing to /opt/iedb_tools. 1. Extract the code and change directory: $ mkdir /opt/iedb_tools $ tar -xvzf IEDB_NG_PVC-VERSION.tar.gz -C /opt/iedb_tools $ cd /opt/iedb_tools/ng_pvc-VERSION 2. Activate the same virtual environment that was created for the T cell class I tool: $ source /path/to/tc1_virtiualenv/bin/activate 3. Run the configure script, passing it the path where the IEDB NG TC1 tool is installed: $ ./configure /path/to/tc1_directory Usage ----- Run the following command usage examples: > python3 src/run_pvc.py -h Input formats ------------- Inputs may be specified in JSON format. See the JSON file in the 'examples' directory. When multiple methods are selected, jobs will be run serially and the output will be concatenated. This can be avoided with the '--split' and '--aggregate' workflow which is described below. Here is an example JSON that illustrates the basic format: { "input_sequence_text": "RKLYCVLLFLSAAE\tRKLYCVLLFLSAFE\nCVLLLSAFFEATYM\t CVLLLSAFFEFTYM\nLSAFEFTAYMINFG\tLSAFEFTFYMINFG\nEFTYMAFFGRGQNA\t EFTYMNFFGRGQNA", "alleles": "HLA-A*02:01", "predictors": [ { "type": "binding", "method": "netmhcpan_el" } ] } * input_sequence_text: List of peptide pairs separated by a tab ("\t"). Each pair must be followed by a newline character ("\n"). * alleles: A comma-separated string of alleles. * predictors: A list of individual predictors to run. See the file examples/input_sequence_text.json from T Cell Class I tool for a list of all possible predictors and options. Multiple predictors may be specified. Job splitting and aggregation ----------------------------- *NOTE that this is an experimental workflow and this package does not contain the code to automate job submission and aggregation. Althogh this workflow is intended for IEDB internal usage, it is the only workflow currently supported by this tool. The workflow consists of: 1. Running PVC with the --split option to create a job_description.json file. 2. Running tcell_mhci.py for the peptide A and peptide B datasets separately. 3. Collating the results with the --aggregate option. The 'job_description.json' file produced with then --split option is used will include the commands needed to run each individual job, its dependencies, and the expected outputs. Each job can be executed as job dependencies are satisfied. The job description file will also contain an aggregation job, that will combine all of theindividual outputs into one JSON file. Caveats ------- All IEDB next-generation standalones have been developed with the primary focus of supporting the website. Some user-facing features may be lacking, but will be improved as these tools mature. Contact ------- Please contact us with any issues encountered or questions about the software through any of the channels listed below. IEDB Help Desk: https://help.iedb.org/ Email: help@iedb.org