# aa-counter

This example app will take list of sequences along with a chosen amino acid and count the number of occurrence in each sequence.

This tool wrapper was created with the NGArgumentParser Framework. This framework is intended to help standardize the workflow for running NG standalone tools and allow for more rapid and distributed development.

All tools created with this framework has three subcommands:
1. **predict**\
   Executes the prediction process for each unit based on the input provided in a JSON file.

2. **preprocess**\
   Splits the input JSON file into smaller, atomic units, each of which is suitable for independent processing by the "predict" command. Additionally, generates a file containing detailed descriptions of the commands to be executed.

3. **postprocess**\
   Aggregates the results from all predict commands into a consolidated output file.

The "src" directory contains several template files that may need to be filled in to incorporate your tool:
1. **run_aa_counter.py**\
   This is the primary script that users execute, responsible for managing and executing the various sub-commands.
2. **AACounterArgumentParser.py**\
   This is the primary argument parser class, where developers can define and customize flags and parameters for the "predict" sub-command.
3. **preprocess.py**\
   This script implements the logic for dividing the input JSON file into discrete, atomic job units for processing.
4. **postprocess.py**\
   This script consolidates the results from the atomic job units into a single, unified output file.
5. **validators.py**\
   This file contains all validation logic for (mainly) the arguments that can be used with Python's argparse.

Getting Started
---------------

## Predict
Inside the `aa-counter`, we can run a simple prediction example.
```bash
# to run JSON file as input
python src/run_aa_counter.py predict -j src/example.json

# to run TSV file as input
python src/run_aa_counter.py predict -t src/example.tsv -a P
```
This will output the results to the terminal in `JSON` format. To specify an output file (`-o`), including the desired path, or to change the file extension (`-f`), use the corresponding flags:
```bash
python src/run_aa_counter.py predict \
-j src/example.json \
-o ./result \
-f json
```
*__NOTE__*: By default, `-f` is set to `json` format.

These formats will not be used in the NG workflow, but it might be valuable for users who prefer to specify more options directly on the command line. 

## Preprocess
Ideally, any apps created with this framework should be able to split inputs into smaller input units. For `aa-counter`, sequences are divided by peptide length. It will create multiple files where each file contains sequences with the same length. 

Note that "output-directory" folder should be created in advance before running the below command.
```bash
python src/run_aa_counter.py preprocess -j src/example.json -o output-directory
```

Notice that the above code outputs some text about creating folders. This is the default output file structure that is provided by this framework. The file structure should look the following:
```
aa-counter
├── output-directory/
│   ├── predict-inputs/
│   │   ├── data/
│   │   └── params/
│   └── predict-outputs/
└── src/
    ├── ...
    ├── postprocess.py
    ├── preprocess.py
    ├── run_aa_counter.py
    └── validators.py
```

If using the default file structure is unfavorable, and would like to use custom folders instead, run the following command:
```bash
python src/run_aa_counter.py preprocess \
-j src/example.json \
-o /PATH/TO/OUTPUT/DIR \
--params-dir /PATH/TO/DIR/FOR/PARAMS \
--inputs-dir /PATH/TO/DIR/FOR/INPUTS
```
This app will split the inputs into smaller units and save them under the defined paths (`--params-dir` and `--inputs-dir`).

At this point, `job_descriptions.json` should have been created inside the `aa-counter` directory.

The `job_descriptions.json` should contain multiple independent job commands along with parameter descriptions. It contains details on how to run the individual jobs including the specific command lines, job dependencies and expected outputs.  

It should roughly look like this (Fuller version can be found under the `ngparser>templates>example-app>example_job_descriptions.json`):
```bash
[
    {
        "shell_cmd": "/ABSOLUTE/PATH/TO/aa-counter/src/run_aacounter.py predict -j /ABSOLUTE/PATH/TO/PARAMS/0-bkewmn69.json -o /ABSOLUTE/PATH/TO/PREDICTION/OUTPUT/result.0 -f json",
        "job_id": 0,
        "job_type": "prediction",
        "depends_on_job_ids": [],
        "expected_outputs": [
            "/ABSOLUTE/PATH/TO/PREDICTION/OUTPUT/result.0.json"
        ]
    },
    ...
    {
        "shell_cmd": "/ABSOLUTE/PATH/TO/aa-counter/src/run_aa_counter.py postprocess --job-desc-file=/ABSOLUTE/PATH/TO/aa-counter/job_descriptions.json --input-results-dir=/ABSOLUTE/PATH/TO/PREDICTION/OUTPUT --postprocessed-results-dir=PATH/TO/OUTPUT",
        "job_id": 3,
        "job_type": "postprocess",
        "depends_on_job_ids": [
            0,
            1,
            2
        ],
        "expected_outputs": [
            "PATH/TO/OUTPUT/final-result.json"
        ]
    }
]
```
Please note that the last command is the only one that uses `postprocess` as it will aggregate all the output files from the previous steps together.

## Postprocess
The last command of the `job_descriptions.json` will look like this, which aggregates all the results into one file.

```bash
python src/run_aa_counter.py postprocess \
--job-desc-file=output-directory/job_descriptions.json \
-o /ABSOLUTE/PATH/TO/OUTPUT/DIR/final-result \
-f json
```

Given the job description file, you can run the postprocess command with the following command:
```bash
python src/run_aa_counter.py postprocess \
--job-desc-file=output-directory/job_descriptions.json \
--postprocessed-results-dir=custom-output-dir
```


If postprocessing needs to be done without having to provide the `job_descriptions.json`, run the same command with `--input-results-dir`:

```bash
python src/run_aa_counter.py postprocess \
--input-results-dir=custom-output-dir/predict-outputs \
--postprocessed-results-dir=custom-output-dir
```


Contributing a standalone
-------------------------
The IEDB team welcomes collaboration in developing new features for our platform. External developers have several opportunities to contribute tools, as outlined below. Contributions from external developers will help accelerate the implementation of tools on the IEDB next-gen tools website. 
> **NOTE:**\
All tools must be pre-approved by the IEDB team before development is started. Submitting a standalone tool to IEDB does not guarantee its integration into the platform.

* 1. Hand off source code or binary (low effort, longest time to completion)
* 2. Create a package with NGArugmentParser and implement the "predict" method (medium effort, medium time to completion)
* 3. Create a package with NGArgumentParser and implement all methods (high effort, shortest time to completion)

> **NOTE:**\
> One of the requirements for the "predict" command is the option to generate output in JSON format that adheres to the NG tools output format specifications. Examples of this format can be found in the 'examples' directory and more details on can be found [here](https://nextgen-tools.iedb.org/docs/).

Once the CLI tool is able to run basic prediction given a JSON file, the next step is to implement preprocessing.