Using ARMADiLLO

ARMADiLLO has been bundled with Cloanalyst and Partis into a Singularity container. The container implements a pipeline that iputs antibody sequences into Cloanalyst (or optionally Partis) to generate UCAs and then performs ARMADiLLO simulations and calculates mutation probability estimates. The container pipeline is designed for minimal input arguments. ARMADiLLO, Cloanalyst and Partis can each be accessed individually from the container allowing for control over each step in the pipeline such as skipping the UCA calculation and directly running ARMADiLLO.

Singularity Container

Usage :
singularity run armadillo.simg <arguments>
Description:
Singularity container for running the Cloanalyst and ARMADiLLO pipeline to perform complete analysis. The antibody sequence is passed to Cloanalyst to calculate the UCA of each antibody. The UCA and antibody sequence is then passed into ARMADiLLO to calculate the probability of amino acid mutations of the anitbody sequence.
Options:

-f –fastafile <fastafile> : defines the fasta file to use for processing REQUIRED

-o –outfilename <FILENAME> : Output file name from Partis. This should be .csv or .yaml (optional)

-n –max_iter <NUMBER> : number of iterations of SHM simulations in ARMADiLLO (default: 10,000)

-c –chain <CHAIN> : chain type required by Par to properly annotate the sequences {‘heavy’, ‘kappa’, ’lambda’} (default: heavy)

-s –species {‘human’,’Homo sapiens’,’mouse’,’Mus musculus’,’monkey’,’rhesis’,’Macaca mulatta’} : defines the species to process (default:Homo sapiens) NOTE:cloanalyst can process human, mouse and monkey while partis can only handle human and monkey

–cloanalyst : Use Cloanalyst to infer the UCA

–partis : Use Partis to infer the UCA

–nprocs <number> : Number of processes over which to parallelize (default:1)

-h : show this help menu

Advanced Options

–csv : sets Partis to output in csv mode, this is an older output mode for Partis. This is the default mode. This can also be set by the extension of the output file name

–yaml : sets Partis to output in yaml mode. This can also be set by the extension of the output file name.

–quick <.amo FILE> : invokes ARMADiLLO in “quick” mode which uses a lookup table instead of actual simulations for calculating results. The .amo file is a binary containing the data table. If the file does not exist, it will be created. If data is not in the lookup table, it will be simulated and added.

To directly run ARMADiLLO invoke:

singularity run --app armadillo  armadillo.simg <ARMADiLLO arguments>

To directly run Cloanalyst invoke:

singularity run --app cloanalyst  armadillo.simg <Cloanalyst arguments>

To directly run Partis invoke:

singularity run --app partis  armadillo.simg <Partis arguments>

Note

The singularity container can run into problems when run across partitions. This issue is solved by using the bind command with the directory to where the data is located as an argument:

singularity run --bind /path/to/data  armadillo.simg -f antibody.seq.fasta

ARMADiLLO

Usage:
ARMADiLLO [sequence file options] -m [S5F mutability model parameters file] -s [S5F substitution model parameters file] <opt arguments>
Description:
ARMADiLLO description
required arguments:

-m [S5F mutability file]

-s [S5F substitution file]

Sequence Files options - either Simple Marked UA file Partis or sequence argument required:

-SMUA [Simple Marked UA file] : argument for Simple Marked UA file from Cloanalyst

-partis [Partis file] : file from Partis either yaml or cvs

-seq [sequence fasta file] : fasta file containing sequences to process requires a UCA file

-uca [UCA fasta file] : UCA fasta can contain either 1 sequence or matching sequences to the sequence file

-markup [markup fasta file] : optional fasta for sequence and UCA sequence files

output arguments:

-simple_text : flag to print out simple text files

-text : flag to print out all text files

-HTML : (default) flag to print out HTML files

-fulloutput : flag to print out all text and HTML files

-annotate : flag to print out annotation of the sequences

optional arguments:

-freq_dir [V, J Frequency file directory] : directory to pull the frequency tables for ARMADiLLO-Quick analysis

-amofile [amo file] : sets the amo file to use for the ARMADiLLO-Quick analysis

-resetamo : flag to reset the amo file associated

-w [line wrap length (60)]

-max_iter [cycles of B cell maturation(1000)]

-c [cutoff for highlighting low prob (1=1%)]

-replace_J_upto [number of replacements in J allowed]

-chain [chain type (heavy=default|kappa|lambda)]

-species [(human=default|rhesus)]

-(l)ineage [number of trees] : argument to generate the mutations through a lineage generation instead of linear generation

-(n)umber [number of mutations] : argument to set number of mutations to generate instead of taking from mutant sequence

-clean_first : flag to turn on cleaning the SMUA prior to running

-output_seqs : flag to turn on printing out simulated sequence]

-ignore_CDR3 : flag to ignore CDR3, default is false

-ignore_V : flag to ignore V, default is false

-ignore_J : lag to ignore J, default is false

-threads [number] : sets the number of threads to use during processing - default is number of processors

-random_seed [provide a random seed]

Example Commands

Container Pipeline

The Singuilarity container has a built in pipeline to take antibody sequences. The antibody sequence is passed through Cloanalyst to calculate the Unmutated Common Ancestor (UCA). The UCA and antibody sequence is used in ARMADiLLO to calculate the probability of the observed mutations. The pipeline is designed to be easy to use with minimal inputs.

The ARMADiLLO container requires a single argument, the fasta file containing the antibody sequences. This assumes a heavy chain sequence from a human:

singularity run armadillo.simg -f antibody.seq.fasta

When a different chain or species needs to be used for calculations, the ARMADiLLO container accepts arguments for setting both. For a set of sequences from a rhesus macaque using the lambda chain, the command would be:

singularity run armadillo.simg -f antibody.seq.fasta -c lambda -s monkey

The accepted species depends on which program is used to calculate the UCA. Cloanalyst can process human, mouse and rhesus, while Partis is only able to handle human and rhesus. The number of SHM simulation iterations can be set along with how many processors are used.

An advanced command is shown below using a fasta sequence file (antibody.seq.fasta), running 100,000 iterations, using 10 processors:

singularity run armadillo.simg -f antibody.seq.fasta -n 1000000 --nprocs 10

To run ARMADiLLO using a pre-generated UCA (uca.seq.fasta) with an antibody sequence (antibody.seq.fasta) use the command:

singularity run --app armadillo armadillo.simg -uca uca.seq.fasta -seq antibody.seq.fasta

Note

The singularity container can run into problems when run across partitions. This issue is solved by using the bind command with the directory to where the data is located as an argument:

singularity run --bind /path/to/data  armadillo.simg -f antibody.seq.fasta

ARMADiLLO

The arguments for interfacing with ARMADiLLO are the same between the stand alone version and the containerized version. The only difference is that the containerized version bundles the mutability and substitution files so that they are automatically included during run time. The stand alone version requires both files (Mutability.csv and Substitution.csv) to be included at run time.

Within Container

Their are two differences between using ARMADiLLO directly versus using it through the singularity container. The first is how ARMADiLLO is invoked to run. The second is that the mutation and substitution files are bundled into the container so they do not need to be called. This unfortunately means that in the container the substitution and mutation files cannot be changed to use different models.

The most basic use case is with a SimpleMarkedUA (SMUA) formated set of antibody sequences:

singularity run --app armadillo armadillo.simg -SMUA SMUA.heavies.fasta

The SMUA files can be easily substituted for Partis files with the “-Partis” flag substituted in place of “-SMUA”. The output files can be controlled with the “-output” flag with the command below requesting just the HTML files:

singularity run --app armadillo armadillo.simg -Partis Partis.heavies.yaml -output html

The final input method is individual sequence and UCA files shown below with all the output files produced:

singularity run --app armadillo armadillo.simg -seq seqs.fasta -uca uca.fasta -output fulloutput

The number of iterations of mutations are controled by the “-max_iter” flag like so:

singuliarty run --app armadillo armadillo.simg -seq seqs.fasta -uca uca.fasta -max_iter 100000

All the options associated with the stand alone ARMADiLLO version (except the substitution and mutation files) are available in the container.

Stand Alone

The stand alone ARMADiLLO version requires mutability and substitution files. The files that are normally used are included with the source code. The mutability and substitution files can be modified to take into account different models. An example command using the stand alone version of ARMADiLLO is shown using an SMUA file generated with Cloanalyst and 10,000 iterations:

ARMADiLLO -m Mutability.csv -s Substitution.csv -SMUA SMUA.heavies.fasta -max_iter 100000

Just like the version bundled in the container, the input and output files can be changed:

ARMADiLLO -m Mutability.csv -s Substitution.csv -Partis partis.heavies.yaml -output html

This command will perform the calculations using the yaml file from Partis and output just the html files. The same arguments are used to control the output files.

Cloanalyst (Container)

Cloanalyst can be accessed in the singularity container. All the arguments associated with running Cloanalyst can be used through the container.

To directly run Cloanalyst invoke as:

singularity run --app cloanalyst armadillo.simg <arguments>

To run Cloanalyst to generate UCAs from human antibodies from the heavy chain use the command:

singularity run --app cloanalyst armadillo.simg -s "Homo sapiens"  -c heavy --excl 5 -g "AR20170307 FPC-F" antibody.fasta

Partis (Container)

In a similar fashion to ARMADiLLO, Partis can be accessed directly in the singularity container. All the arguments associated with running Partis can be used through the container. For a complete list of Partis arguments, please see the Partis website.

To directly run Partis invoke as:

singularity run --app partis armadillo.simg <arguments>

To run Partis to annotate the antibodies use the command:

singularity run --app partis armadillo.simg annotate --infname antibody.fasta --locus igh --outfname results.yam

The example command uses heavy chain where the chain types are described as igl for lambda and igk for kappa. To increase the number of processors being used add the “–n-procs” flag:

singularity run --app partis armadillo.simg annotate --infname antibody.fasta --locus igh --outfname results.yaml  --n-procs 6

Partis can be downloaded from the github repositary and run outside of the container. Please see the Partis website for more details.