Installation

ARMADiLLO

ARMADiLLO stands for “Antigen Receptor Mutation Analyzer for Detection of Low-Likelihood Occurrences”. Its purpose is to mutate an Unmutated Common Ancestor (UCA) to generate a set of sequences to calculate the probability of any given mutation. This is then compared to an antibody to determine which mutations are probable and which are improbable.

Using Singularity

Singularity is a free, cross-platform and open-source computer program that perform containerization, operating-system-level virtualization. Singularity brings containers allowing reproducibility to scientific computing and high-performance in the computing world. Singularity supports building different containers containging software and pipelines that can be reproducably run. We have build an container that contains Cloanalyst, Partis and ARMADiLLO and can be run on any system with Singularity. The container includes Cloanalyst, Partis and ARMADiLLO with a pipeline for feeding in antibody sequences and producing mutational probability. ARMADiLLO, Cloanalyst and Partis can be accessed separately from the container.

From Source

ARMADiLLO is written in C++ using the Boost Libraries. The source code is currently unavailable but can be requested from the Duke Human Vaccine Institute at jsm56 @ duke.edu. The source code includes a make file. To install ARMADiLLO from source; install the Boost Libraries and then run the make file.

UCA generation

ARMADiLLO requires a UCA to preform the mutations but the UCA can be generated in multiple ways. We recommend using inference software such as Cloanalyst or Partis when the UCA is unknown. ARMADiLLO natively accepts the output from both pieces of software. Additionally, ARMADiLLO can accept a list of antibody sequences as a fasta file and a separate file containing UCAs. See the -seq <file> and -uca <file> options for more details.

Cloanalyst

Cloanalyst is a software implementation of a suite of statistical methods for the inference of antigen receptor rearrangements developed by Tom Kepler. Cloanalyst performs a Bayesian analysis of antibody genes to compute posterior probabilities over rearrangement parameters and unmutated ancestral rearrangements, using either single immunoglobulin polynucleotide sequences or sets of clonally related immunoglobulin sequences. If Cloanalyst is used to generate a UCA (which is standard in the pipeline), please cite:

  • Cloanalyst Kepler T.B. Reconstructing a B-cell clonal lineage. I. Statistical inference of unobserved ancestors. F1000Res. 2013;2:103

Cloanalyst outputs several files; however the important file is the SMUA file. It is similar to the fasta format, with the format shown below:

>antibody name
antibody seq
>antibody name|UCA
antibody UCA seq
>antibody name|V gene|D gene|J gene
markup seq

The SMUA format is the easiest way to feed multiple sequences with different UCAs into ARMADiLLO. For a single UCA with multiple sequences, please see the input arguments of “-seq” and “-uca”.

Partis

Partis is an HMM-based framework for sequence annotation, simulation, clonal family and germline inference, and affinity prediction of B-cell and T-cell receptors. It is built upon ig-sw set of Smith-Watermann annotation and ham HMM compiler tools. The various components are described in the following papers. Since they do quite different things, it’s best if you can cite the specific paper(s) that describe the components that you’re using.

  • Selection metrics Ralph, DK, & Matsen IV, FA (2020). Using B cell receptor lineage structures to predict affinity. Submitted to PLOS Computational Biology.
  • Germline inference Ralph, DK, & Matsen IV, FA (2019). Per-sample immunoglobulin germline inference from B cell receptor deep sequencing data. PLOS Computational Biology, 15(7), e1007133.
  • Clonal family inference Ralph, DK, & Matsen IV, FA (2016). Likelihood-based Inference of B-cell Clonal Families. PLOS Computational Biology, 12(10), e1005086.
  • HMM framework and BCR annotation Ralph, DK, & Matsen IV, FA (2016). Consistency of VDJ Rearrangement and Substitution Parameters Enables Accurate B Cell Receptor Sequence Annotation. PLOS Computational Biology, 12(1), e1004409.

Partis produces either yaml and csv files containing sequences and UCAs. ARMADiLLO supports using both formats.