IEDB Next-Generation Tools Peptide Variant Comparison - version v0.4-beta
========================================================================

Introduction
------------
This package is a wrapper around T Cell Class I and Class II tools and will run
predictors from those tools (binding, elution, immunogenicity) on a set of
paired peptides and compare the results.  Additionally it includes ICERFIRE,
which was specifically developed to quantify differences between wild-type and
mutant peptides in the context of cancer neoepitopes.

Release Notes
-------------
v0.1 - Initial public beta release
v0.2 - Add ICERFIRE support
v0.3 - bug fix and requirements update
v0.3.1 - input peptides other than 8-14mers were ignored
v0.4 - Add MHCII (T Cell Class II) support with TC2 integration


Prerequisites
-------------
The following prerequisites must be met before installing this tool:

+ Linux 64-bit environment.  Most modern Linux distributions should work.
  * http://www.ubuntu.com/

+ Python 3.8 or higher
  * http://www.python.org/

+ IEDB T Cell Class I Tool (for MHCI predictions)
  * https://nextgen-tools-dev.iedb.org/download-all

+ IEDB T Cell Class II Tool (for MHCII predictions)
  * https://nextgen-tools-dev.iedb.org/download-all

+ SQLite3


Installation
------------
Below, we will use the example of installing to ~/iedb_tools.

1. Extract the code and change directory: 
  $ mkdir ~/iedb_tools
  $ tar -xvzf IEDB_NG_PVC-v0.4-beta.tar.gz -C ~/iedb_tools
  $ cd ~/iedb_tools/ng_pvc-v0.4-beta

2. (Optional) - If you plan to use ICERFIRE, there are a few additional steps.

  a. Create a Python virtual environment under which ICERFIRE will run.

    $ python3 -m venv /path/to/icerfire_virtualenv
    $ source /path/to/icerfire_virtualenv/bin/activate
    $ pip install -r environments/icerfire-requirements.txt

  b. Download a copy of the IEDB PepX database.  Note that this database is large (>100GB), so ensure you have enough free
     space available. The database can be downloaded from https://downloads.iedb.org/datasets/pepx/LATEST.

3. Switch to activate T Cell Class I Python virtual environment if you have set it up:

    $ source /path/to/tc1_virtualenv/bin/activate

  Otherwise, you can create now:
  (Please note Python 3.8 or higher is required for T Cell Class I Python virtual environment)

    $ python3 -m venv /path/to/mhci_virtualenv
    $ source /path/to/mhci_virtualenv/bin/activate
    $ pip install -r environments/pvc-requirements.txt

  Note: The same virtual environment can be used for both T Cell Class I and Class II tools.

4. Using a text editor (e.g., nano), update the values in the `paths.py` to match the layout of your system.
   You will need to set:
   - tcell_class_i_path: Path to the T Cell Class I tool directory
   - tcell_class_ii_path: Path to the T Cell Class II tool directory (required for MHCII predictions)
   - pepx_db_path: Path to the PepX database (required for ICERFIRE)
   - icerfire_python_path: Path to Python binary for ICERFIRE (required for ICERFIRE)

5. Run the `configure` script:

  $ ./configure


Usage
-----

python3 src/run_pvc.py [-j] <input_json_file> [-o] <output_prefix> 

The output format will be 'json'.

Example: python3 src/run_pvc.py  -j examples/input_sequence_text.json -o output

Run the following command, or see the 'example_commands.txt' file in the 'src'
directory for typical usage examples:

> python3 src/run_pvc.py -h


Input formats
-------------
Inputs may be specified in JSON format.  See the JSON files in the 'examples'
directory.  When multiple methods are selected, jobs will be run serially and
the output will be concatenated.  This can be avoided with the '--split' and 
'--aggregate' workflow which is described below.

Here is an example JSON that illustrates the basic format for MHCI (mode: tc1):

{
  "mode": "tc1",
  "input_sequence_text": "RKLYCVLLFLSAAE,RKLYCVLLFLSAFE\nCVLLLSAFFEATYM,CVLLLSAFFEFTYM\nLSAFEFTAYMINFG,LSAFEFTFYMINFG\nEFTYMAFFGRGQNA,EFTYMNFFGRGQNA",
  "alleles": "HLA-A*02:01",
  "predictors": [
    {
      "type": "binding",
      "method": "netmhcpan_el"
    }
  ]
}

Here is an example JSON for MHCII (mode: tc2):

{
  "mode": "tc2",
  "input_sequence_text": "YYLEQQLAKPLLRIF,YYLEQQLAKPFLRIF\nYTCLKCGERFRQNSH,YTCLKCGERFKQNSH\nYLARSIDPLPRPPSP,YLARSIDPLPQPPSP\nYGGGFSSSSSSFGSGF,YGGGFSSSSSFGSGF\nWQHVSFEVDPTRLEP,WQHVSFEVDPPRLEP",
  "alleles": "HLA-DRB1*01:01",
  "predictors": [
    {
      "type": "binding",
      "method": "netmhciipan_el"
    },
    {
      "type": "immunogenicity",
      "method": "cd4episcore"
    }
  ]
}

* mode: Either "tc1" for MHCI predictions or "tc2" for MHCII predictions. Required.
* input_sequence_text: List of peptide pairs separated by a comma (",") or tab ("\t"). Each
  pair must be followed by a newline character ("\n"). For MHCI, peptides should be 8-14
  amino acids long. For MHCII, peptides should be 11-30 amino acids long.
* alleles: A comma-separated string of alleles.
* predictors: A list of individual predictors to run. Each predictor should include:
  - type: "binding", "elution", or "immunogenicity"
  - method: The specific prediction method (e.g., "netmhcpan_el", "netmhciipan_el", etc.)
  - tools_group: (Optional) "mhci" for MHCI predictors or "mhcii" for MHCII predictors.
    If not specified, the tools_group will be inferred from the mode.
  See the example files (examples/mhci.json and examples/mhcii.json) for complete examples
  and a list of all possible predictors and options. Multiple predictors may be specified.


Job splitting and aggregation
-----------------------------
*NOTE that this is an experimental workflow and this package does not contain
the code to automate job submission and aggregation. Althogh this workflow is
intended for IEDB internal usage, it is the only workflow currently supported
by this tool.

The workflow consists of:

1. Running PVC with the --split option to create a job_description.json file.
2. Running tcell_mhci.py (for MHCI) or tcell_mhcii.py (for MHCII) for the peptide A
   and peptide B datasets separately.
3. Collating the results with the --aggregate option.

The 'job_description.json' file produced with then --split option is used
will include the commands needed to run each individual job, its dependencies,
and the expected outputs. Each job can be executed as job dependencies are
satisfied.  The job description file will also contain an aggregation job,
that will combine all of theindividual outputs into one JSON file.


Caveats
-------
All IEDB next-generation standalones have been developed with the primary
focus of supporting the website.  Some user-facing features may be lacking,
but will be improved as these tools mature.


Contact
-------
Please contact us with any issues encountered or questions about the software
through any of the channels listed below.

IEDB Help Desk: https://help.iedb.org/
Email: help@iedb.org