Mapping-by-Sequencing using MiModD and CloudMap
***********************************************

What is mapping-by-sequencing
=============================

The classical approach to identifying the *causative* mutation of any 
particular mutant phenotype consists of two separate steps: first, genetic 
mapping is used to narrow down a genomic region for which genetic markers 
introduced by crossing indicate linked inheritance with the phenotype, then 
candidate DNA stretches in that region are sequenced to identify the mutation.
 
Through whole-genome sequencing it is now possible to merge these two steps 
into one *mapping-by-sequencing* step and to speed up mutation identification 
enormously. After any mapping cross, the inheritance pattern for any set of 
genetic markers can now be determined *along* with candidate mutations from the 
same sequencing data.

Moreover, with mapping-by-sequencing essentially all non-causative mutations 
(including even previously unknown ones) present in any of the strains used for 
crossing can be used as marker mutations. This makes mapping-by-sequencing not 
only a fast, but also an extremely sensitive and versatile method.

To be useful for mapping-by-sequencing experiments, analysis tools need to be 
able to identify mutations, but also to report or visualize the inheritance 
pattern of marker mutations so that researchers can use that information to 
identify the most likely candidate for the *causative* mutation from the 
potentially long list of all identified variants.

What is CloudMap ?
==================

*CloudMap*, accessible through the main Galaxy server at 
`http://usegalaxy.org <http://usegalaxy.org>`__, is a collection of tools for 
analysis and visualization of mapping-by-sequencing experiments performed with 
virtually any model organism. At its core are the three mapping tools:

  -  CloudMap: EMS Variant Density Mapping, 
  -  CloudMap: Variant Discovery Mapping with WGS data, and
  -  CloudMap: Hawaiian Variant Mapping with WGS data
  
, which support the visual interpretation of the marker inheritance patterns 
obtained from following any of three popular mutation mapping approaches. 

More information on CloudMap can be found in the `CloudMap online documentation 
<https://usegalaxy.org/u/gm2123/p/cloudmap>`__.

MiModD complements CloudMap
===========================

While the CloudMap tools really facilitate the interpretation of inheritance 
patterns of sets of mutations, the suite itself does not offer tools to 
identify variants in the first place, but instead, relies on assembling 
additional tools available on the main Galaxy server into relatively complex 
workflows. 

MiModD makes it possible to perform the complete upstream sequence analysis 
generating the input data for the three core CloudMap tools efficiently on a 
local computer. Hence, it eliminates the need to upload primary sequence data 
(with huge file sizes) to a remote server. As a rather specialized package, 
MiModD also provides a much simpler interface to the necessary alignment, 
variant calling and variant filtering steps than what can be realized by 
combining standard Galaxy tools.

An interface between MiModD and CloudMap
========================================

The standard vcf format variant lists generated by MiModD are not directly 
compatible with the CloudMap tools, but MiModD offers the *cloudmap* subcommand 
that can transform the data for use with any of the three CloudMap mapping 
tools. The resulting CloudMap-compatible vcf files are small enough to be 
transferred to remote machines conveniently, where the results can then be 
visualized using CloudMap. 

Analyzing whole-genome sequencing data from mapping experiments
===============================================================

In this section, we review the three strategies for mapping-by-sequencing that 
CloudMap is compatible with and explain how the corresponding data can be 
analyzed in MiModD. We assume that you are already familiar with the relevant 
MiModD tools.

Simple Variant Density (SVD) Mapping
------------------------------

referred to in CloudMap as *EMS Variant Density Mapping*

In this simple form of mapping, a phenotypically defined mutant strain obtained, 
for example, from a mutagenesis screen gets backcrossed (selecting for the 
phenotype) to its unmutagenized parent strain or outcrossed to a different 
strain, then sequenced. Because of linked inheritance the phenotypic selection 
will not only work on the causative mutation itself, but also on nearby 
*non-causative* mutations introduced during mutagenesis, i.e., the causative 
mutation is expected to be found in the center of a mutation-rich region. 
This approach works best if the sequence of the crossing strain (parent strain 
or unrelated) is analyzed along with that of the outcrossed mutant or if 
several mutants derived from the same parent are analyzed together since this 
makes it possible to eliminate variants that represent (misinformative) sequence 
deviations between the reference genome and the crossing strain.

The joint analysis of several sequencing datasets is one of the hallmark 
features of MiModD and results are conveniently stored in a single multisample 
variant call file per analysis. Starting from there, the *vcf-filter* tool 
enables the straightforward exclusion of variants with any desired pattern of
genotypes across the samples.
The `Multi-sample analysis <tutorial_example2.html>`__ section of the Tutorial 
provides an illustration of how simple this makes it to eliminate common 
background mutations.

After filtering to retain only informative marker mutations, you can simply 
pass the resulting vcf variants file to the *cloudmap* tool. 

Variant Allele Frequency (VAF) Mapping
-------------------------------------

referred to in CloudMap as *Variant Discovery Mapping*

This approach is an extension of the Simple Variant Density Mapping above. 
Instead of generating a single outcrossed strain over several rounds of 
crossing, the mutant strain, here, gets crossed only once to the parent strain 
or an unrelated strain. Then, the non-uniform (segregating) F2 generation is 
screened for phenotypically mutant individuals, which are sequenced as a pool, 
an approach often referred to as *bulk segregant analysis*. Compared to Simple 
Variant Density Mapping, Variant Allele Frequency Mapping provides 
finer-grained linkage information at less experimental effort since every 
variant present in the starting strain is not only probed simply for presence 
or absence after the outcross, but the fraction of variant over reference 
alleles in the sequenced pool provides a direct estimate of the **probability 
of separating the variant from the phenotype**.

As before, it is essential that the crossing strain sequence is analyzed along 
with the outcrossed pool so that misinformative variants present already in the 
crossing strain can be subtracted before the interpretation of the mapping 
results.

MiModD, by default, always calculates the required ratio between variant- and 
reference-supporting reads for every detected variant site. Hence, data 
preparation for Variant AF Mapping with CloudMap can proceed almost exactly as 
with Simple Variant Density Mapping. Sequencing data of the crossing strain and 
of the outcrossed pool are analyzed together resulting in a multisample variants 
file. However, the *vcf-filter* tool cannot be used efficiently here to 
eliminate crossing strain variants because the tool defines filters based on 
genotypes, which, conceptually, does not make sense for the pooled sample. 
Instead, you should pass the unfiltered multisample variants file to the MiModD 
*cloudmap* tool setting the mode to *Variant* and indicate the samples 
representing the pool and the crossing strain, respectively, and the tool will 
retain only those variant sites for which there is no evidence of variant reads 
in the crossing strain sample. The resulting dataset is ready for analysis with 
CloudMap.

Generating data for use with the CloudMap: Hawaiian Variant Mapping tool
------------------------------------------------------------------------

referred to in CloudMap as *Hawaiian Variant Mapping*

The name of this CloudMap tool is misleading in that it is a general,
non-organism-specific Crossing Strain Variant Mapping tool rather than
being restricted to the most widely used mapping strain in C. elegans
research.

The mapping strategy here can be thought of as reversed Variant
Discovery Mapping. Just as described in the previous section, bulk F2
segregant analysis is used to obtain linkage information, with the only
exception that the variants that are analyzed are those **inherited from
the crossing strain** (as opposed to those from the original mutant
strain as in Variant Discovery Mapping). Due to this difference the
interpretation of the linkage pattern is reversed in comparison to
Variant Discovery Mapping: crossing strain variants tend to be excluded
from the phenotypic F2 pool in the vicinity of the phenotype-causing
mutation.

If WGS data for the crossing strain is available, MiModD can be used
exactly like for Variant Discovery Mapping to prepare data for Hawaiian
Variant Mapping analysis with CloudMap. The only difference is that the
``vcf_filter`` tool here should be used to retain only the variants that
the crossing strain is homozygous for (instead of excluding them).

Most geneticists using this mapping strategy, however, will not
resequence the crossing strain, but will use some established crossing
strain with already characterized variants for their organism, like the
Hawaiian strain for C. elegans. Accordingly, MiModD provides the option
to use the information about known variant sites instead of aligned
reads for the crossing strain at the stage of variant calling. This
alternative way of combining MiModD and the CloudMap Hawaiian Variant
Mapping tool is described in great detail in the MiModD Tutorial section
`Incorporating classical mapping strategies <tutorial_example3.html>`__.