MiModD User Guide: Tool Documentation
*************************************

As an end-user you have two possibilities to work with MiModD: 

you can use it as a suite of tools from the command line or, more conveniently,
through its built-in Galaxy interface. 

The first has the advantage of not requiring any additional software 
installation and avoids the memory and performance overhead of running the 
Galaxy server. 

The later clearly provides the more intuitive user-interface, particularly for 
beginners. 

In this chapter we will explain both the command line usage and the 
corresponding Galaxy wrapper for each tool in the MiModD package. For most 
tools, the relationship between their command line parameters and their 
configuration in Galaxy should be obvious, but we will point out differences 
between the two interfaces when they are important.

Overview of the MiModD tools
============================

The following is a list of all command line tools currently built into MiModD
together with a brief description of the purpose of each tool::

    info          retrieve information about the samples encoded in a file for
                  various supported formats
    header        generate a SAM format header from an NGS run description
    convert       convert between different sequenced reads file formats
    reheader      from a BAM file generate a new file with specified header
                  sections modified based on the header of a template SAM file
    sort          sort a BAM file by coordinates (or names) of the mapped
                  reads
    snap          align sequence reads using the SNAP aligner
    snap-batch    run several snap jobs and pool the resulting alignments into
                  a multi-sample SAM/BAM file
    snap-index    index a reference genome for use with the SNAP aligner
    varcall       predict SNPs and indels in one or more aligned read samples
                  and calculate the coverage of every base in the reference
                  genome using samtools/bcftools
    varextract    extract variant sites from BCF input as generated by varcall
                  and report them in VCF
    covstats      summary coverage statistics for varcall output
    delcall       predict deletions in one or more samples of aligned paired-
                  end reads based on coverage of the reference genome and on
                  insert sizes
    vcf-filter    extract lines from a vcf variant file based on sample- and
                  field-specific filters
    annotate      annotate a vcf variant file with information about the
                  affected genes
    snpeff-genomes
                  list installed SnpEff genomes
    cloudmap      generate CloudMap-compatible output from a vcf file

as you can obtain it, from the command line, by running::

  mimodd --help

You can run any individual tool through::

  mimodd <tool name> <tool specific arguments>

Additional help on each tool and the arguments it accepts can be
obtained by::

    mimodd <tool name> --help

, e.g.:: 

    $ mimodd header --help

    usage:  header [-h] [--rg-id RG_ID [RG_ID ...]] [--rg-sm RG_SM [RG_SM ...]]
                   [--rg-cn RG_CN [RG_CN ...]] [--rg-ds RG_DS [RG_DS ...]]
                   [--rg-dt RG_DT [RG_DT ...]] [--rg-lb RG_LB [RG_LB ...]]
                   [--rg-pl RG_PL [RG_PL ...]] [--rg-pi RG_PI [RG_PI ...]]
                   [--rg-pu RG_PU [RG_PU ...]] [--co [COMMENT [COMMENT ...]]]
                   [-o OFILE] [-x]

    optional arguments:
      -h, --help            show this help message and exit
      -o OFILE, --ofile OFILE
                            redirect the output to the specified file (default:
                            stdout)
      -x, --relaxed         do not enforce a sample name to be specified for every
                            read group

    read group description:
      --rg-id RG_ID [RG_ID ...]
                            one or more unique read group identifiers
      --rg-sm RG_SM [RG_SM ...]
                            one sample name per read group identifier
      --rg-cn RG_CN [RG_CN ...]
                            one sequencing center name per read group
      --rg-ds RG_DS [RG_DS ...]
                            one description line per read group
      --rg-dt RG_DT [RG_DT ...]
                            date runs were produced (YYYY-MM-DD); one date per
                            read group
      --rg-lb RG_LB [RG_LB ...]
                            library identifier for each read group
      --rg-pl RG_PL [RG_PL ...]
                            sequencing platform/technology used to produce each
                            read group
      --rg-pi RG_PI [RG_PI ...]
                            predicted median insert size for the reads of each
                            read group
      --rg-pu RG_PU [RG_PU ...]
                            platform unit, e.g., flowcell barcode or slide
                            identifier, for each read group

    other information:
      --co [COMMENT [COMMENT ...]]
                            an arbitrary number of one-line comment strings

If you have enabled MiModD in your local Galaxy installation, the same
tools should be available (under slightly different names) in the MiModD
section of the Galaxy Tools bar. This is a screenshot of what the
section should look like:

.. image:: /images/MiModD_Galaxy_Toolbar.png

While the MiModD Galaxy tools are really just wrappers around the tools
available from the command line, they provide a graphical interface that
typically is easier to handle for the new or occasional user. As an example, 
here is the Galaxy interface for the *mimodd header* command (called 
*NGS Run Annotation* in Galaxy):

.. image:: /images/NGS_Run_Annotation.png

In addition, to providing user-friendly access to individual tools, Galaxy also 
lets you combine individual tools into complex workflows without advanced 
knowledge of shell syntax or scripting languages. The chapter `MiModD and 
Galaxy <galaxy.html>`__ in this guide explains how MiModD lets you run complete 
analyses of NGS data - from sequenced reads to lists of variants - as a single 
Galaxy workflow without user intervention.

--------------

The MiModD tools in detail
==========================

The MiModD tools can be grouped in three categories:

- **the format converter and query tools (info, header, convert, reheader, 
  sort, covstats, snpeff-genomes, cloudmap)** help you annotate and arrange 
  your sequencing data and let you convert between different formats for 
  storage and use by other MiModD and external tools. 
- **the core tools (snap, snap-batch, snap-index, varcall, varextract and 
  delcall)** do the computation-intensive analysis of the data, aligning 
  sequence reads to reference genomes and detecting sequence variants 
  (currently SNPs, indels and deletions) with respect to the reference.
- **the data exploration tools (annotate and vcf-filter)** let you annotate 
  and filter the results of these analyses and, thus, help you to extract
  biologically relevant information.

--------------

The Format Converter and Query Tools
------------------------------------

.. _info:

The info tool
~~~~~~~~~~~~~

Provide information about file contents.

command line usage
..................

::
    
    mimodd info <input file> [OPTIONS]

The tool expects an input file and reports the metadata it finds encoded
in it. It works with and auto-detects almost every file format used
anywhere in the MiModD package, i.e., SAM, BAM, vcf, bcf and fasta, and is 
useful for quick checks of file contents. 
The tool can produce either plain-text or html output, which, like for many 
MiModD tools, is printed to the standard output (stdout, i.e., the 
console/terminal window) by default, but can be redirected with the ``-o`` or 
``--ofile`` option to the specified file.

available options:

``-o <file>``, ``--ofile <file>`` : [default: stdout]
   redirect the output to the specified file

``--oformat <html|txt>`` : [default: txt]

``-v``, ``--verbose``

Galaxy interface
................

.. _galaxy-info:

Galaxy tool name : Retrieve File Information

.. image:: /images/tool_doc/retrieve_file_info.png

Choose the input file from a drop-down menu of available files from your
user history, then click Execute to run the tool. In Galaxy, a tool's
output goes to a file. Galaxy stores these files in an internal database
and presents them to you in the history pane. Upon running the tool, its
results file will automatically be added to your history. Click on the
eye symbol to see the file contents in the main pane.

--------------

.. _header:

The header tool
~~~~~~~~~~~~~~~

Construct a SAM format file header from information about read groups and 
comments, which can be used by other tools (reheader, convert and snap) to 
annotate sequence reads files with metadata.

command line usage
..................

::

    mimodd header [--rg-id <RG_ID> [RG_ID]...] [--rg-sm <RG_SM> [RG_SM]...]
                  [--rg-cn <RG_CN> [RG_CN]...] [--rg-ds <RG_DS> [RG_DS]...]
                  [--rg-dt <RG_DT> [RG_DT]...] [--rg-lb <RG_LB> [RG_LB]...]
                  [--rg_pl <RG_PL> [RG_PL]...] [--rg_pi <RG_PI> [RG_PI]...]
                  [--rg_pu <RG_PU> [RG_PU]...]
                  [--co <COMMENT> [COMMENT]... ] [OPTIONS]

available options:

``-o <file>``, ``--ofile <file>`` : [default: stdout]
   redirect the output to the specified file

``-x``, ``--relaxed``
   do not enforce a sample name to be specified for every read group

A read group in SAM/BAM format refers to a group of sequences that form a 
logical unit and should be treated together. Typically, a read group comprises 
sequence reads obtained from one sample in a particular sequencing run. You can 
add information about one or several read groups to the *mimodd header* output 
through the following command line arguments:

``--rg-id <ID> [ID]...``
    Read group IDs are unique strings unambiguously identifying a read group.
    They may be provided by the sequencing center generating the data, but you 
    can choose any ID(s) as long as you make sure that your naming scheme does 
    not cause ID collisions in downstream analysis steps.

``--rg-sm <sample name> [sample name]...``
    For each read group declared by providing an ID for it, you **must** 
    specify a sample name using this option (use the ``-x`` switch to make 
    sample names optional). The sample name should identify the biological 
    sample that the read group provides sequencing data for.
    
``--rg-cn <sequencing center> [sequencing center]...``
    Optional name of the sequencing center that generated the data for the 
    read group(s).
    
``--rg-ds <description> [description]...``
    Optional one-line read group description(s).
    
``--rg-dt <YYYY-MM-DD> [YYYY-MM-DD]...``
    Optional information in YYYY-MM-DD date format about when the data for the 
    read group(s) was generated.
    
``--rg-lb <lib identifier> [lib identifier]...``
    Optional library identifier(s). A unique identifier of the sequence 
    library that was used for sequencing should be given here.
    
``--rg-pl <platform name> [platform name]...``
    Optional name of the sequencing technology that was used to produce the 
    data for the read group(s). Should be one of the following officially 
    supported names: CAPILLARY, LS454, ILLUMINA, SOLID, HELICOS, IONTORRENT, 
    PACBIO.
    
``--rg-pi <insert size> [insert size]...``
    Optional value(s) for the predicted median insert size between sequence 
    reads of the read group(s) on the sequenced DNA fragments. Only applicable 
    for paired-end sequencing data.
    
``--rg-pu <unit ID> [unit ID]...``
    Optional identifier of the physical sequencing unit that the data for the 
    read group(s) was generated on. Depending on the sequencing platform that 
    was used (see the --rg_pl option) this could be, e.g., a flowcell-barcode 
    or a sequencing slide.

The number of read groups declared in the resulting SAM header is determined by 
the number of read group IDs and matching number of sample names specified 
through the ``--rg-id`` and ``--rg-sm`` options. All other information for any 
read group is optional. If you do not want to provide a certain parameter for 
any read group you can simply omit the option. To provide a given parameter for 
some, but not all of the specified read groups, use the empty string ``""`` as 
a placeholder. Pairing between parameters is determined by the input order. 

Example 1
,,,,,,,,,

**Specify two read-groups with optional sequencing date information** ::

    $ mimodd header --rg-id 001 002 --rg-sm sample1 sample2 --rg-dt 2014-10-23 2014-09-14
    
    @HD	VN:1.5
    @RG	ID:001	SM:sample1	DT:2014-10-23
    @RG	ID:002	SM:sample2	DT:2014-09-14
    
Example 2
,,,,,,,,,

**Specify three read-groups with dates, but a description only available for 
the second** ::

    $ mimodd header --rg-id 001 002 003 --rg-sm sample1 sample2 sample3 --rg-dt 2014-10-23 2014-09-14 2014-11-09 --rg-ds "" "WT control" ""
    
    @HD	VN:1.5
    @RG	ID:001	SM:sample1	DT:2014-10-23
    @RG	ID:002	SM:sample2	DS:WT control	DT:2014-09-14
    @RG	ID:003	SM:sample3	DT:2014-11-09
    
In addition to or instead of read group information the *header* tool also lets 
you specify an arbitrary number of comment lines. These are not bound to any 
specific read group and should, generally, be used for remarks concerning the 
read groups or their analysis as a whole though you can, of course, *refer* to a 
particular read group in the comment text.

Example 3
,,,,,,,,,

**Generate the three-sample header above, but with two general comment lines 
appended** ::

    $ mimodd header --rg-id 001 002 003 --rg-sm sample1 sample2 sample3 --rg-dt 2014-10-23 2014-09-14 2014-11-09 --rg-ds "" "WT control" "" --co "data from the xx sequencing project" "provided by yy lab"
    
    @HD	VN:1.5
    @RG	ID:001	SM:sample1	DT:2014-10-23
    @RG	ID:002	SM:sample2	DS:WT control	DT:2014-09-14
    @RG	ID:003	SM:sample3	DT:2014-11-09
    @CO	data from the xx sequencing project
    @CO	provided by yy lab

Note the use of quotes to group elements. Without them a new comment line would 
be started after every whitespace.

Example 4
,,,,,,,,,

**Generate a header without read groups and only comment lines** ::

    $ mimodd header --co "forgot to say:" this
    
    @HD	VN:1.5
    @CO	forgot to say:
    @CO	this

While such a header may not be terribly useful on its own, it can be used with 
the reheader_ tool to add comments to an existing header.

.. Note::

   Within MiModD the order of comment lines will, generally, be preserved, but 
   this is not a file-format guarantee, i.e., other software is free to change 
   it.
   
The difference between read group ID and sample name
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

A frequent 
You have sequenced two DNA preparations (DNA_prep1 and DNA_prep2) and
you store the reads from the two samples in one input file.

A minimal header for this file can be generated with:

.. code::

    mimodd header --rg-id first second --rg-sm DNA_prep1 DNA_prep2

resulting in this SAM header:

::

    @HD VN:1.0
    @RG SM:DNA_prep1    ID:first
    @RG SM:DNA_prep2    ID:second
    @SQ LN:0    SN:

Assume you are not satisfied with the reads from DNA_prep1. You
resequence that sample and store the reads in the same file. The new
header can be generated with:

.. code::

    mimodd header --rg_id first second third --rg_sm DNA_prep1 DNA_prep2 DNA_prep1 --co "resequenced prep 1 hoping for better quality"

The SAM header now looks like this:

::

    @HD VN:1.0
    @RG SM:DNA_prep1    ID:first
    @RG SM:DNA_prep2    ID:second
    @RG SM:DNA_prep1    ID:third
    @SQ SN:     LN:0
    @CO resequenced prep 1 hoping for better quality

.. Note:

    A focus on multi-sample files is a distinguishing feature of MiModD that
    accounts for much of its simplicity! Be sure you understand the
    relationship between read group IDs and sample names since it is a
    recurring theme with many MiModD tools.

Galaxy interface
................

.. _galaxy-header:

Galaxy tool name : NGS Run Annotation

This tool is more limited than its command line counterpart in that it only 
allows a **single read group** to be specified and described. There is no 
possibility to incorporate reference sequence information or comments. The main
purpose of the tool is to generate annotation for unaligned single-sample input 
files that can be merged with this data by the `Convert <galaxy-convert_>`_ and 
`Snap Read Alignment <galaxy-snap_>`_ tools. In addition, it can be used in 
combination with the `Reheader <galaxy-reheader_>`_ tool to modify or replace 
individual read groups in multi-sample BAM files.

.. image:: /images/tool_doc/ngs_run_anno.png


--------------

.. _convert:

The convert tool
~~~~~~~~~~~~~~~~

Convert between different sequenced reads file formats.

command line usage
..................

::

     mimodd convert <input file> [input file]... [OPTIONS]
     
available options:

``-o <file>``, ``--ofile <file>`` : [default: stdout]
   redirect the output to the specified file
   
``--iformat <fastq|fastq_pe|gz|gz_pe|sam|bam>`` : [default: bam]
   the format of the input file(s); the input format determines
    
   - whether the ``-h`` option is applicable and
   - whether multiple input files are allowed and how they are used:
   
     - with input in SAM or BAM format only one input file is accepted
     - with input in single-end fastq format (compressed or not; format strings
       ``fastq``, ``gz``) multiple input files may be given and the reads from 
       all files will be written to the output in concatenated form, i.e., as 
       if they had come from a single source, providing an easy way to merge 
       split fastq files into a single SAM/BAM file
     - with input in paired-reads fastq format (compressed or not; format 
       strings ``fastq_pe``, ``gz_pe``) multiple input files are interpreted as 
       alternating r1 and r2 files, i.e., every other file is expected to 
       contain the read mates of the preceding file
       
``--oformat <sam|bam>`` : [default: sam]
   the output format
   
``-h <header file>``, ``--header <header file>``
   use the header of the provided SAM file as the output header;
   applicable only when ``--iformat`` specifies a headerless file format, i.e., 
   one of ``fastq``, ``fastq_pe``, ``gz`` or ``gz_pe``

Additional specifications:

Standard input can be used instead of an input file by specifying a minus sign 
``-`` instead of a file name.


The following table summarizes the conversions that are currently supported:

+--------------------------------------+---------+----------------+-----------------+
| from                                 | to      | >1 input files | optional header |
+======================================+=========+================+=================+
| fastq (or gzip compressed fastq.gz)  | sam     | yes            | yes             |
+--------------------------------------+---------+----------------+-----------------+
| fastq (or gzip compressed fastq.gz)  | bam     | yes            | yes             |
+--------------------------------------+---------+----------------+-----------------+
| sam                                  | bam     | no             | no              |
+--------------------------------------+---------+----------------+-----------------+
| bam                                  | sam     | no             | no              |
+--------------------------------------+---------+----------------+-----------------+

.. Note::

   We anticipate future versions of MiModD to implement subsampling and 
   filtering during conversions. This is the reason why, in addition to the 
   format combinations above, the currently meaningless (unless you are looking 
   for a slow way to copy files) *"conversions"* sam <-> sam and bam <-> bam 
   are also accepted.
   
Examples:

Example 1
,,,,,,,,,

**View the contents of a BAM file by converting it to SAM format** ::

    $ mimodd convert myreads.bam
    
Conversion from BAM to SAM is the default behavior.

Example 2
,,,,,,,,,

**Convert a SAM file to BAM format** ::

    $ mimodd header --rg-id 007 --rg-sm "secret sample" --co "for your eyes only" | \
      mimodd convert - --iformat sam --oformat bam
    
    �BCdsr�e�c``p�p�
                    ��2�3�r��t�200�
                                   ��*NM.J-Q(N�-�I�rp��L�/R��/-RH�L-V��˩���`��J�BCw

While the output is not too useful, this example demonstrates how the tool can 
be used in a pipe. Now just to show that the above worked as expected::

    $ mimodd header --rg-id 007 --rg-sm "secret sample" --co "for your eyes only" | \
      mimodd convert - --iformat sam --oformat bam | mimodd convert -
      
    @HD	VN:1.5
    @RG	ID:007	SM:secret sample
    @CO	for your eyes only

Galaxy interface
................

.. _galaxy-convert:

Galaxy tool name : Convert

This tool has the exact same functionality in Galaxy and from the command line.
The Galaxy input form simplifies the coice of parameters, though, by 
dynamically displaying only applicable options based on the selected input file 
format.

.. image:: /images/tool_doc/convert.png


--------------

.. _reheader:

The reheader tool
~~~~~~~~~~~~~~~~~
command line usage
..................

::

    mimodd reheader [template SAM file] <BAM input file>
                    [--rg ignore|update|replace [RG_TEMPLATE] [RG_MAPPING]...]
                    [--sq ignore|update|replace [SQ_TEMPLATE] [SQ_MAPPING]...]
                    [--co ignore|update|replace [CO_TEMPLATE]]
                    [--rgm RG_MAPPING [RG_MAPPING]...]
                    [--sqm SQ_MAPPING [SQ_MAPPING]...] [OPTIONS]

available options:

``-o <file>``, ``--ofile <file>`` : [default: stdout]
   redirect the output to the specified file
   
``-H``
   output only the resulting header in SAM format

This complex, but powerful tool rewrites an existing ``<BAM input file>`` 
modifying its header on the fly based on the header information of one or 
several template SAM files according to at least one of these optional 
parameters:

``template SAM file``
    if specified the header of this file provides the read group information, 
    the sequence dictionary and the comment lines that will be used to modify 
    the input file header.
    
``--rg ignore|update|replace [RG template SAM file] [RG mapping]...``
    the --rg option followed by one of the keywords ignore, update or 
    replace specifies how the read group section of the new header should be 
    compiled. With ``ignore`` any template read group information is ignored 
    and the read group section of the BAM input file is copied to the new file 
    unchanged. With ``update`` the input file header is modified by template 
    read group information according to these rules: any read groups found 
    *only* in the BAM input *or* in the template information are copied to the 
    new header. For read groups with matching IDs found in the BAM input *and* 
    in the template, any information about this read group found in the 
    template will be written to the new header (possibly overwriting input file 
    information), but information found only in the BAM input will be copied 
    unchanged. With ``replace`` the rules are: all template read groups are 
    written to the new header as they are (possibly overwriting read groups 
    with matching IDs in the BAM input and *without* information merging). Read 
    groups found only in the BAM input file are discarded.
    
    Both ``update`` and ``replace`` may optionally be followed by 
    
    - a read group-specific template file, which will be used as the source of 
      template read group information (i.e., if the general 
      ``template SAM file`` is also given, its read group section is ignored)
      
    and/or
    
    - a read group mapping to be used to determine read group ID matching 
      between the BAM input file and the template. This mapping has to be 
      provided in the exact form ``input_id : template_id`` (with spaces around 
      the colon). Several mappings must also be separated by a space. When a 
      mapping is given and a template read group with the ID template_id is 
      found, it is treated as if its ID was input_id.
      
``--sq ignore|update|replace [SQ template SAM file] [SQ mapping]...``
    the --sq option determines the use of template sequence information in just 
    the same way as --rg does for read groups. The only differences are that 
    sequences will be matched by their name (SN header values) and that the 
    length (LN value) and the MD5 checksum (M5 value) of a sequence will 
    *never* be modified when rewriting sequence information from the BAM input 
    to the new header.
    
``--co ignore|update|replace [template SAM file for comments]``
    determines how template comments are used. They can be ignored, appended to 
    (with update) any comments found in the BAM input file or be used to 
    replace any comments from the BAM input. As with the --rg and --sq options, 
    an optional specific template SAM file can also be specified.
    
``--rgm RG mapping [RG mapping]...``
    this option lets you specify one or read group mappings (in the format 
    shown above for the --rg option) that are used to *rename* read groups.
    Importantly, renaming is always happening *after* all other modifications 
    and the input_id(s) given in the mapping(s) have to still exist at that 
    point.
    
``--sqm SQ mapping [SQ mapping]...``
    equivalent to --rgm for renaming sequence names
    
Additional specifications:

- A minus sign ``-`` can be used in place of any of the different template file 
  names that the tool supports indicating that the respective header 
  template(s) should be read from standard input. This feature makes it 
  possible to use the tool efficiently in pipes, e.g., with input coming from 
  the header_ tool.
  
- If a general ``template SAM file`` is specified the tool's default for the
  ``--rg``, ``--sq`` and ``--co`` will be set to ``replace``, otherwise the 
  defaults will be ``ignore``.
  
- The tool autodetects header changes (changes in read group IDs or sequence 
  names, elimination of read groups or sequence records) that may be 
  incompatible with records in the body of the BAM file. In these situations, 
  it will scan the file body for incompatible records and try to fix them or, 
  failing that, will terminate with an error message. In this way, the tool 
  guarantees that its reheadered output will always be a *valid* BAM file. On 
  the other hand, the scan causes very significant runtime increases leading to 
  the folowing recommendations:
  
  - reserve the use of the ``--rgm`` and ``--sqm`` options for vital cases
  
  - use ``replace`` mode for read groups and sequence names (``--rg`` and 
    ``--sq`` options) with caution and take care not to remove read group or 
    sequence records from the final header that get referred to in the body of 
    the BAM file.

Examples:

The following examples assume that you have a BAM input file named 
"example_input" with the following header (printed in SAM format)::

    @HD	VN:1.0	SO:coordinate
    @RG	ID:001	SM:sample 1	DT:2013-04-30
    @RG	ID:002	SM:sample 2	DT:2013-04-28
    @SQ	SN:chrI	LN:15072423
    @SQ	SN:chrII	LN:15279345
    @SQ	SN:chrIII	LN:13783700

This file contains reads from two samples mapped against a common reference 
genome, i.e. the sequences *chrI*, *chrII*, *chrIII* and the read group IDs 
*001* and *002* will be referenced in the body of the file.

.. Note::
   The reheader tool is quite complex and may take some practice to use 
   efficiently. The ``-H`` option instructs the tool to generate only the new 
   header in human readable SAM format and we encourage beginners to use this 
   option to check the outcome of their parameter settings before producing the 
   full reheadered BAM file.
   
Example 1
,,,,,,,,,

**Adding descriptions to existing read groups** ::

    $ mimodd header -x --rg-id 001 002 --rg-ds "labbook #1, exp #216" "labbook #1, exp #217" | \
      mimodd reheader -H example_input --rg update -
      
    @HD	VN:1.0	SO:coordinate
    @RG	ID:001	SM:sample 1	DS:labbook #1, exp #216	DT:2013-04-30
    @RG	ID:002	SM:sample 2	DS:labbook #1, exp #217	DT:2013-04-28
    @SQ	SN:chrI	LN:15072423
    @SQ	SN:chrII	LN:15279345
    @SQ	SN:chrIII	LN:13783700

Note that, for this illustration, we used the ``-H`` option to obtain only the 
resulting header in SAM format.

We decided to generate the SAM header template with the descriptions to be 
added using the header_ tool. Use of the ``-x`` option saved us from having to 
redundantly specify the sample names again. We piped the output directly to the 
reheader tool to use it as the ``--rg`` template. ``update`` mode allows us to 
*add* the descriptions to the existing information of each read group.
Using ``replace`` instead would have produced this unintended output::
    
    @HD	VN:1.0	SO:coordinate
    @RG	ID:001	DS:labbook #1, exp #216
    @RG	ID:002	DS:labbook #1, exp #217
    @SQ	SN:chrI	LN:15072423
    @SQ	SN:chrII	LN:15279345
    @SQ	SN:chrIII	LN:13783700

with all the existing read group information replaced by the newly porvided one.

Example 2
,,,,,,,,,

**Adding comments** ::
    
    $ mimodd header --co "let me just say:" "this header needs comments" | \
      mimodd reheader -H example_input --co update -
      
    @HD	VN:1.0	SO:coordinate
    @RG	ID:001	SM:sample 1	DT:2013-04-30
    @RG	ID:002	SM:sample 2	DT:2013-04-28
    @SQ	SN:chrI	LN:15072423
    @SQ	SN:chrII	LN:15279345
    @SQ	SN:chrIII	LN:13783700    
    @CO	let me just say:
    @CO	this header needs comments

Again, we generated the new header elements using mimodd header_. In this 
particular case, the same output would have been generated with 
``--co replace`` mode since no comment lines existed in the original BAM file.

Example 3
,,,,,,,,,

**Renaming a read group** ::

    $ mimodd reheader -H example_input --rgm 001 : 003
    
    @HD	VN:1.0	SO:coordinate
    @RG	ID:003	SM:sample 1	DT:2013-04-30
    @RG	ID:002	SM:sample 2	DT:2013-04-28
    @SQ	SN:chrI	LN:15072423
    @SQ	SN:chrII	LN:15279345
    @SQ	SN:chrIII	LN:13783700

Note that if we had not used the ``-H`` option, but had produced a full 
output BAM file with::

    $ mimodd reheader example_input --rgm 001 : 003 -o example_output
    
the tool would have scanned the body of the input BAM file and would have 
replaced any references to read group ID 001 with references to ID 003.

Example 4
,,,,,,,,,

**Adding information to read groups and changing their ID in one pass**

Assume you have a template SAM file named "template.sam" that has additional 
information about the read groups, but lists those read groups under different 
IDs, e.g::

    @HD	VN:1.0	SO:coordinate
    @RG	ID:752	SM:sample 1	DS:labbook #1, exp #216
    @RG	ID:753	SM:sample 2	DS:labbook #1, exp #217

You could use::

    $ mimodd reheader -H example_input --rg update template.sam 001 : 752 002 : 753
    
    @HD	VN:1.0	SO:coordinate
    @RG	ID:001	SM:sample 1	DS:labbook #1, exp #216	DT:2013-04-30
    @RG	ID:002	SM:sample 2	DS:labbook #1, exp #217	DT:2013-04-28
    @SQ	SN:chrI	LN:15072423
    @SQ	SN:chrII	LN:15279345
    @SQ	SN:chrIII	LN:13783700

to assign the new information to the correct read groups. If you not only 
wanted to update the information about the read groups, but also wanted to 
change their IDs to match those in the template file, you would have to use::

    $ mimodd reheader -H example_input --rg update template.sam 001 : 752 002 : 753 --rgm 001 : 752 002 : 753
    
    @HD	VN:1.0	SO:coordinate
    @RG	ID:752	SM:sample 1	DS:labbook #1, exp #216	DT:2013-04-30
    @RG	ID:753	SM:sample 2	DS:labbook #1, exp #217	DT:2013-04-28
    @SQ	SN:chrI	LN:15072423
    @SQ	SN:chrII	LN:15279345
    @SQ	SN:chrIII	LN:13783700 

Providing the mapping twice is necessary in this case because the changes of 
IDs according to the ``--rgm`` option are always made *after* any operations 
specified by ``--rg``.

Galaxy interface
................

.. _galaxy-reheader:

Galaxy tool name : Reheader BAM file

.. image:: /images/tool_doc/reheader.png

.. image:: /images/tool_doc/reheader_rg.png


--------------

.. _sort:

The sort tool:
~~~~~~~~~~~~~~

Sort the reads in a BAM file.

command line usage
..................

::

    mimodd sort <input file> [OPTIONS]
    
available options:

``-o <file>``, ``--ofile <file>`` : [default: stdout]
   redirect the output to the specified file

``--oformat <bam|sam>`` : [default: bam]

``-n``, ``--by_name`` 

``-l <compression level>`` : [default: 6 (standard BAM compression level)]

``-m <memory>``, ``--memory <memory>`` : [default: config settings]

``-t <threads>``, ``--threads <threads>`` : [default: config settings]


By default, the tool sorts the reads in a BAM input file by their mapped 
reference coordinates. The ``-n`` switch can be used to sort by read names 
instead. 
The ``-l``, ``-m``, and ``-t`` options let you tweak the tool for optimal 
performance on your system. Specifically, they control the compression level to 
use for intermediate and final output, how much memory in GB to use for the
sorting and how many threads to engage in it. Since the default values for 
``-m`` and ``-t`` come from MiModD's configuration settings (see `Configuring 
MiModD for your system <INSTALL.html#configuring-mimodd-for-your-system>`__) 
they should be adequate in most situations.

Galaxy interface
................

.. _galaxy-sort:

Galaxy tool name : Sort BAM File

The Galaxy tool version exposes the ``--by_name`` and ``--oformat`` options of 
the command line tool as check boxes and should be self-explaining.

.. image:: /images/tool_doc/sort.png


--------------


.. _covstats:

The covstats tool:
~~~~~~~~~~~~~~~~~~

Provide a coverage summary report for output from the varcall_ tool.

command line usage
..................

::

    mimodd covstats <input bcf file> [-o <file>]
                
available options:

``-o <file>``, ``--ofile <file>`` : [default: stdout]
   redirect the output to the specified file

Galaxy interface
................

.. _galaxy-covstats:

Galaxy tool name : Coverage Statistics

.. image:: /images/tool_doc/coverage_stats.png

--------------

.. _snpeff_genomes:

The snpeff-genomes tool:
~~~~~~~~~~~~~~~~~~~~~~~~

Query the host machine's SnpEff installation for registered and installed 
(i.e., usable) genome annotation files.

command line usage
..................

::

    mimodd snpeff-genomes <input bcf file> [OPTIONS]

available options:

``-o <file>``, ``--ofile <file>`` : [default: stdout]
   redirect the output to the specified file

``-c <path>``, ``--config <path>``
   use this path to locate the SnpEff installation directory instead of looking 
   it up in the MiModD configuration file

Galaxy interface
................

.. _galaxy-snpeff-genomes:

Galaxy tool name : List Installed SnpEff Genomes

.. image:: /images/tool_doc/list_snpeff_genomes.png

--------------

.. _cloudmap:

The cloudmap tool:
~~~~~~~~~~~~~~~~~~

Generate CloudMap-compatible output from a vcf file.

command line usage
..................

::

    mimodd cloudmap <vcf input file> <analysis mode> 
                    [reference sample name] <sample name> <[OPTIONS]
                
available options:

``-o <file>``, ``--ofile <file>`` : [default: stdout]
   redirect the output to the specified file

``-s <file>``, ``--seqdict <file>``
   generate also the sequence dictionary required by the *CloudMap Variant 
   Density Mapping* and *Hawaiian Mapping* tools and write it to the specified 
   file

This tool takes a multi-sample vcf input file (as it can be generated by the 
varextract_ tool and generates single-sample vcf from it that is compatible 
with the mutation mapping tools provided by `CloudMap 
<https://usegalaxy.org/u/gm2123/p/cloudmap>`__. 

Together with the input file name you must specify the intended analysis mode 
as one of "EMS", "VARIANT", "HAWAIIAN" indicating which of the three existing 
CloudMMap mapping tools (*EMS Variant Density Mapping*, *Variant Discovery 
Mapping* and *Hawaiian Variant Mapping* you are going to use the output for. 

This choice determines the meaning of the *reference sample* and *sample name* 
parameters:

- In EMS mode, a reference sample cannot be specified and sample name must be 
  an existing sample name in the vcf input file, for which you wish to generate 
  mutation density plots in CloudMap.
  
- In VARIANT mode, the sample name also indicates the sample for which CloudMap 
  will generate mutation density plots, but the variants that will be retained 
  in the output and, thus, will be mapped by CloudMap are **only those not 
  present in the reference** sample.
  
- HAWAIIAN mode, finally, is the reverse of VARIANT mode in that only those 
  variants from the specified sample will be included in the output which the  
  reference sample is homozygous mutant for.

Galaxy interface
................

.. _galaxy-cloudmap:

Galaxy tool name : Prepare Variant Data for Mapping

The Galaxy tool has the same functionality as its command line counterpart, but 
the input form changes to include or exclude the field for entering a reference 
sample name depending on the selected analysis mode.

.. figure:: /images/tool_doc/galaxy_cloudmap.png
   
   The tool interface in *EMS* mode ...

.. figure:: /images/tool_doc/galaxy_cloudmap_variant.png

   ... and in *Hawaiian* mode.
   
--------------

The Core Tools
--------------

.. _snap:

The snap tool:
~~~~~~~~~~~~~~

Align sequence reads to a reference genome.

command line usage
..................

::

    mimodd snap <sequencing mode> <reference genome>|<index directory> 
                <input file> [<input file2>] -o <file> [OPTIONS]
                
available options:

``--iformat <fastq|gz|sam|bam>`` : [default: bam]

``--oformat <sam|bam>`` : [default: bam]

``--header <header file>``

``-t <threads>``, ``--threads <threads>`` : [default: config settings]

``-q``, ``--quiet``

``-v``, ``-verbose``

[advanced options]

The tool aligns NGS reads in fastq (gzipped or uncompressed), SAM or BAM format 
(specified with ``--iformat``) to a reference genome using the SNAP aligner.

Besides an input file (or two in case of paired-end reads in fastq
format) of reads, it expects a specification of the sequencing mode,
which can either be "single" or "paired", and a reference genome to
align the reads against. The tool will build a temporary index of the
reference genome (to make the index permanent, specify a directory to
hold it with the advanced option ``--idx-out``). Alternatively, a
precalculated index (through a previous call of the tool or through the
snap-index tool) may be specified. Whether a reference genome or an
index is provided will be auto-detected.

The tool requires your input file(s) to have associated header
information. If this is not the case, e.g., with fastq input files, you
must provide a SAM file with that information through the ``--header``
option. You can also use this option to exchange existing header
information in the input just for the alignment.

Note that, currently, the tool does not support printing to the standard
output, so specifying an output file with the -o option is mandatory.

Galaxy interface
................

Galaxy tool name : -

.. Note::

   This tool is not directly accessible from Galaxy. The Galaxy interface of 
   MiModD uses the snap-batch tool instead.

--------------

.. _snap_batch:

The snap-batch tool:
~~~~~~~~~~~~~~~~~~~~~

Execute a batch of ``snap`` tool calls automatically in an optimized way and 
merge the results of all calls into a single aligned reads file with several 
read groups.

command line usage
..................

::

    mimodd snap-batch -s <mimodd snap command line> [mimodd snap command line]...

or:

::

    mimodd snap-batch -f <input file with snap command lines>

The individual commands passed through the ``-s`` option or in the file 
specified with ``-f`` should be valid command lines for the mimodd snap_
tool, but without the leading ``mimodd``.

In case of the ``-s`` option, individual commands should be enclosed in
quotes to allow command grouping. 
With the ``-f`` option, the file should have one ``snap`` command per line. 

The name of the output file and its format (SAM or BAM) are taken from
the ``-o`` and ``--oformat`` options specified for the first ``snap`` call. 
Both options are ignored when used as part of later ``snap`` calls. 
The advanced option ``--sort`` is treated the same way.

Galaxy interface
................

.. _galaxy-snap:

Galaxy tool name : Snap Read Alignment

A few restrictions currently apply when using this tool from the Galaxy
interface:

As explained above, all parameters except for the output file, its format and 
sort order can differ between the indiviual snap calls specified when the tool 
is used from the command line. 

In contrast, all but the input file format, and optional custom input file 
header information are shared between calls when the tool is used from Galaxy. 
This behaviour helps to keep the Galaxy interface relatively simple and is good 
enough for aligning reads from several samples and/or read groups against a 
common reference with common settings. 

For more complex tasks, however, you will have to use the snap-batch command 
line interface.

.. figure:: /images/tool_doc/snap.png
   
   The Snap Read Alignment tool with standard options.

--------------
   
.. figure:: /images/tool_doc/snap_extended.png

   The advanced options section, which can be expanded by selecting *change 
   settings* under *further parameter settings*. 


--------------

.. _snap_index:

The snap-index tool:
~~~~~~~~~~~~~~~~~~~~~

Generate an index of a reference genome for use by the ``snap`` and 
``snap-batch`` tools.

command line usage
..................

::

    mimodd snap-index <reference genome> <index_directory> [OPTIONS]
    
``-s <seed size>``

``-h <slack>``

``-t <threads>``, ``--threads <threads>`` : [default: config settings]

Use of this tool is rarely necessary since snap and snap_batch call the
underlying indexing function internally if needed.

Galaxy interface
................

Galaxy tool name : - 

.. Note::

   This tool is not accessiblle from Galaxy.

--------------

.. _varcall:

The varcall tool:
~~~~~~~~~~~~~~~~~

Predict single-nucleotide variants (SNVs) and indels with respect to a 
reference genome in one or more groups of aligned reads.

command line usage
..................

::

    mimodd varcall <reference genome> <input file> [input file]... [OPTIONS]
    
available options:

``-o <vcf output file>``, ``--ofile <vcf output file>`` : [default: stdout]
   redirect the output to the specified file

``-c <cov output file>``

``-s <vcf white-list file> [output file]``

``-d <max per-BAM depth>``

``-i``

``-q``, ``--quiet``

``-v``, ``--verbose``

Optionally, the tool can produce additional output for positions listed
in the vcf file specified with the -s option. This output can be
redirected to a file by providing the optional output file as the second 
argument following ``-s``.

It can also produce a cov file, specified with the -c option, with
sample-specific coverage information for every position of the reference
genome. varcall offers the choice between calling variants on a per-read
group (enabled with the ``-i`` switch) or a per-sample basis (the default).
This choice also affects the optional additional files if produced.

The ``-q`` switch suppresses the original output from the underlying
samtools mpileup and bcftools view command, while the -v(erbose) switch
enables additional messages from MiModD.

Galaxy interface
................

.. _galaxy-varcall:

Galaxy tool name : Variant Calling

.. image:: /images/tool_doc/variant_calling.png

The tool lets you choose a reference genome and one or more input files
of aligned reads from drop-down menus.

Toggle switches can be used to enable or disable the coverage and
white-list sites output and to set the mode of read grouping.

The maximum per-BAM depth can also be set.

--------------

.. _varextract:

The varextract tool:
~~~~~~~~~~~~~~~~~~~~
Extract variant sites from BCF input as generated by varcall and report them in 
VCF format.

command line usage
..................

::

    mimodd varextract <input bcf file> [OPTIONS]

``-o <file>``, ``--ofile <file>`` : [default: stdout]
   redirect the output to the specified file

``-p <vcf file> [<vcf file>]...``, ``--pre-vcf <vcf file> [<vcf file>]...``

``-a``, ``--keep-alts``

``-v``, ``--verbose``

Galaxy interface
................

.. _galaxy-varextract:

Galaxy tool name : Extract Variant Sites

.. image:: /images/tool_doc/extract_variant_sites.png

--------------

.. _delcall:

The delcall tool:
~~~~~~~~~~~~~~~~~

Predict deletions in one or more groups of aligned paired-end reads.

command line usage
..................

::

    mimodd delcall <input bam file> [input bam file]... <coverage file> [OPTIONS]

``-o <file>``, ``--ofile <file>`` : [default: stdout]
   redirect the output to the specified file

``--max-cov <maximal coverage>``

``--min-size <minimal size>``

``-u``, ``include-uncovered``
   include uncovered regions (even if they do not qualify as deletions)

``-i``, ``--group-by-id``

``-v``, ``--verbose``

The tool predicts deletions in one or more samples of aligned paired-end
reads from the specified input bam files based on coverage of the
reference genome (taken from the input cov file) and on observed insert
sizes around the region.

By default, the tool reports uncovered regions, for which their is
statistical evidence that they are real deletions. This behaviour can be
changed so that any uncovered region will be included with the ``-u``
switch.

The output (in gff format) can be redirected from the standard output to
a file specified with the ``-o`` option.

The ``--max-cov`` and ``--min-size`` options can be used to set coverage and
size thresholds for reporting uncovered regions/deletions.

Like in varcall, there is an ``-i`` switch to control the mode of read
grouping, but its meaning is slightly different. If enabled reads are
grouped strictly by their read groups, but the default is to group reads
by their associated sample names only for the coverage analysis, i.e.,
to detect uncovered regions. The statistical test for real deletions,
however, will always be performed on a per-read group basis.

.. Note::

    This tool requires paired-end data for the statistical assessment of
    deletions, but single-end data with the same sample name can be used in
    and improve the initial coverage analysis.

Galaxy interface
................

.. _galaxy-delcall:

Galaxy tool name : Deletion Prediction for Paired-end Data

.. image:: /images/tool_doc/deletion_prediction.png

You can select one or more bam input files and a cov file from drop-down
menus.

You can provide custom values for the maximal coverage allowed inside a
region to consider it as a deletion and the minimal deletion size.

Toggle switches let you control whether to inlude uncovered regions that
are not statistically significant deletions in the output and how to
group reads from the input bam file(s).

--------------

The Data Exploration Tools
--------------------------

.. _vcf_filter:

The vcf-filter tool:
~~~~~~~~~~~~~~~~~~~~~

Filter multi-sample vcf files, like the ones generated by ``varcall``, by 
chromosomal region, variant type (i.e., INDEL or single nucleotide change), or 
by sample-specific genotypes and coverage.

command line usage
..................

::

    mimodd vcf-filter <input file> [OPTIONS]
    
``-o <file>``, ``--ofile <file>`` : [default: stdout]
   redirect the output to the specified file

``-s <sample name> [sample name]...``, ``--samples ...``
   one or more names of samples that the sample-specific filters ``--gt`` , 
   ``--dp`` and ``--gq`` should work on
   
``--gt <genotype pattern> [genotype pattern]...``
   one or more genotype patterns of the form x/x[, x/x] where x=0 indicates a 
   reference allele and x=1 a variant allele; exactly one genotype pattern 
   (possibly composed of several comma-separated genotypes) needs to be given 
   per sample name specified through ``-s`` and variants will be retained only 
   if every listed sample has a genotype contained in its corresponding 
   genotype patterns; use ANY as the genotype pattern to exclude a sample from 
   being used in genotype filtering
   
``--dp <depth> [depth]...``
   depth of coverage threshold; exactly one value needed for every sample 
   specified through ``-s``; variants will only be retained if every sample 
   declared with ``-s`` shows a read coverage of the variant site at least 
   equal to the corresponding threshold; use 0 to skip the requirement for a 
   given sample
   
``--gq <genotype quality> [genotype quality]``
   genotype quality threshold; exactly one value needed for every sample 
   specified through ``-s``; variants will only be retained if every sample 
   declared with ``-s`` has a quality score for its predicted genotype equal to 
   or higher than the corresponding threshold; specifying 0 as the threshold 
   effectively skips gq filtering for a given sample

``-r <chromosome>:[start-stop] [chromosome:[start-stop]]...``, ``--region ...``
   retain variants only if they fall into one of the given chromosomal regions 
   (specified in the format chrom:start-stop or chrom: for a whole chromosome)
   
``-i``|``-I``, ``--no-indels``|``--indels-only``
   mutually exclusive options to filter out or retain only indels
   
``--vfilter <sample name> [sample name]...``
   filter by sample name(s); instead of filtering variants, this option will 
   retain sample-specific information about each variant only for the specified 
   samples; if several sample names are provided their order also determines 
   the order of sample-specific information in the output; useful for 
   demultiplexing and restructuring (the sample-specific part of) multi-sample 
   vcf files

To filter variants based on sample-specific information, first use the ``-s`` 
option to specify one or more samples by name to which the filters should be 
applied. Then use the ``--gt``, ``--dp`` or ``--gq`` options to specify the 
filter criteria. Each of these options, when present, has to be followed by a 
number of filters matching the number of samples declared with ``-s`` and the 
filters are associated with the samples by position on the command line.

--gt sets genotype filters. Currently, 0/0 (homozygous ref allele), 0/1
(heterozygous) and 1/1 (homozygous alt allele) are allowed as <genotype
patterns>. Several genotypes can be specified by separating them with
commas. The keyword ANY is allowed as a placeholder (i.e., no genotype
filter should be applied to the given sample), and this should be
preferred over the alternative 0/0,0/1,1/1 notation for clarity,
briefity and efficiency.

--dp sets coverage filters. To pass the filter a variant site has to be
covered by at least as many sample-specific reads as set with --dp.
Consequently, 0 can serve as a placeholder here.

--gq sets genotype quality filters. To pass the filter the quality of
the genotype call for the relevant sample has to be at least as high as
the filter criterion. Again, 0 has a placeholder function.

All filters discussed above are subtractive and line-based, i.e., if, and only 
if, a variant passes ALL filters, the full entry is retained.

In addition, the tool supports one column-based/vertical filter via the
--v_filter option. If used, then of the sample-specific columns of any
line (that passes all other filters) only those describing samples
specified with the option are retained.

Examples:

Example 1
,,,,,,,,,

Of a vcf input file *variants_in.vcf* with a sample called *sample1* (and 
possibly others) retain all variants for which sample1's genotype is 0/1 
(heterozygous) or 1/1 (homozygous for the variant allele)::

    $ mimodd vcf-filter variants_in.vcf -s sample1 --gt 0/1,1/1
    
Example 2
,,,,,,,,,

Of the same file, retain all variants, for which sample1's genotype is 0/1 or 
1/1 AND for which sample2's genotype is 0/0 (homozygous for the reference 
allele)::

    $ mimodd vcf-filter variants_in.vcf -s sample1 sample2 --gt 0/1,1/1 0/0

Example 3
,,,,,,,,,

With the same input again, retain all variants for which the genotype of 
*sample1* is 0/1 or 1/1 AND for which *sample1* AND *sample2* show at least 
3-fold coverage (the genotype of *sample2* is not used for filtering)::

    $ mimodd vcf-filter variants_in.vcf -s sample1 sample2 --gt 0/1,1/1 ANY --dp 3 3

Example 4
,,,,,,,,,

Retain all variants found on chromosomes *chrI* and *chrII* for which the 
genotype of *sample1* is 1/1 AND that of *sample2* is 0/0. For these variants, 
report only sample-specific information for *sample1*::
                    
    $ mimodd vcf-filter variants_in.vcf -s sample1 sample2 --gt 1/1 0/0 --region chrI: chrII: --vfilter sample1
                    
Galaxy interface
................

.. _galaxy-vcf-filter:

Galaxy tool name : VCF Filter

.. image:: /images/tool_doc/vcf_filter.png

.. image:: /images/tool_doc/vcf_filter_sample_region.png

--------------

.. _annotate:

The annotate tool:
~~~~~~~~~~~~~~~~~~

Generate annotated output for called variants to simplify data exploration

command line usage
..................

::

    mimodd annotate <input file> [OPTIONS]
    
available options:

``-o <file>``, ``--ofile <file>`` : [default: stdout]
   redirect the output to the specified file
   
``-f <html|text>``, ``--oformat <html|text>`` : [default: html]
   output format

``--grouping <by_sample|by_genes>``
   group annotated variants by sample or by affected gene instead of simply 
   keeping the input file order
   
``-l <link formatter file>``, ``--link <link formatter file>``
   file to read custom hyperlink formatting instructions from; effective only 
   with ``-f html``
   
``--species <species>``
   the name of the species to assume for the annotations
   
In addition, the following SnpEff-specific options are supported:

``-g <genome>``, ``--genome <genome>``
   use SnpEff with <genome> as its genome file to annotate variants with their 
   effects on genes and transcripts; the exact name of the file has to be 
   provided; use the ``snpeff-genomes`` tool to get a list of all available 
   files
   
``-s <file>``, ``--snpeff-out <file>``
   generate an additional file <file> which holds the original SnpEff output
   
``--stats <file>``
   generate an additional file <file> with the SnpEff results summary
   
``-m <memory>``, ``--memory <meemory>`` : [default: config settings]
   maximal memory to use in GB

``--minC <coverage threshold>``

``--minQ <qualtiy threshold>``

``--no-downstream``, ``--no-upstream``, ``--no-intron``, ``--no-intergenic``, ``--no-utr``
   do not annotate *downstream*, *upstream*, *intron*, *intergenic* or *UTR* 
   effects, respectively

``--ud <distance>`` : [default: 500]
   specify the upstream/downstream interval length, i.e., variants more than 
   <distance> nts from the next annotated gene are considered to be intergenic

``--chr``

This tool provides information about the genes and transcripts affected
by the variants in a vcf input file.

It uses SnpEff to add annotations to the vcf input, and an installed
annotated SnpEff genome has to be specified, from which the annotations
will be extracted. The *snpeff_genomes* tool (see below) can be used to
get a list of all installed SnpEff genomes on your system.

By default, the tool generates a single output file specified with the
-o option. This is a html file with a table of all variants, which
transcripts they affect and, possibly, hyperlinks to the relevant genes
and genome views (supported for selected organisms).

The --group_by_sample switch affects the order of this table, changing
it from position-based with interleaved samples (the default) to sorted
by sample first.

On demand, the original annotated vcf generated by SnpEff can be stored
in a separate file specified with the -s option.

In addition, an optional SnpEff summary file of the variant effects can
be specified with the --stats option.

The -q and -v switches have their usual MiModD meaning, i.e., to
suppress the original output from SnpEff and to enable verbose output
from the annotate tool itself, respectively.

All other options affect the behaviour of SnpEff:

Galaxy interface
................

.. _galaxy-annotate:

Galaxy tool name : Variant Annotation

.. image:: /images/tool_doc/variant_anno.png