File Formats used by MiModD¶
Work in progress
We apologize, but this section of the User Guide is very incomplete still.
FASTA¶
FASTA is a text format that can store multiple sequences in a single file.
Each sequence begins with a single-line description, followed by lines of
sequence data. Description lines are distinguished from sequence data lines by
a greater-than > symbol at the beginning of the line.
Lines can be terminated with either CR+LF (Windows-style) or LF
(Unix/Linux-style). Blank lines are not allowed.
The sequence data lines should be formatted as blocks of equal line length.
In MiModD, FASTA format is used exclusively for reference genome input files and MiModD-specific restrictions apply to the description lines found in the files. Specifically, description lines must not contain:
- non-printable or non-ASCII characters
- whitespace characters
- any of the characters:
<>[]*;=,
This restriction is enforced by all tools that require a fasta reference genome. The MiModD.sanitize tool can be used to substitute illegal characters in description lines and also ensures that sequence data lines are block-formatted.
Note
The character restriction exists because MiModD will use the full content of the description line as the sequence name and we must ensure that this name is a valid sequence name in all downstream data formats generated during any analysis.
See also
MiModD tools that use fasta input files
snap, snap-batch, snap-index, varcall
MiModD tools to manipulate fasta files
sam¶
See also
MiModD tools that accept sam input files
MiModD tools that produce sam output files
snap, snap-batch, header,
MiModD tools to manipulate sam files
bam¶
See also
MiModD tools that accept bam input files
snap, snap-batch, varcall, delcall, MiModD.index
MiModD tools that produce bam output files
MiModD tools to manipulate bam files
vcf¶
See also
MiModD tools that use vcf input files
MiModD tools that produce vcf output files
MiModD tools to manipulate vcf files