File Formats used by MiModD

Work in progress

We apologize, but this section of the User Guide is very incomplete still.

FASTA

FASTA is a text format that can store multiple sequences in a single file.

Each sequence begins with a single-line description, followed by lines of sequence data. Description lines are distinguished from sequence data lines by a greater-than > symbol at the beginning of the line.

Lines can be terminated with either CR+LF (Windows-style) or LF (Unix/Linux-style). Blank lines are not allowed.

The sequence data lines should be formatted as blocks of equal line length.

In MiModD, FASTA format is used exclusively for reference genome input files and MiModD-specific restrictions apply to the description lines found in the files. Specifically, description lines must not contain:

  • non-printable or non-ASCII characters
  • whitespace characters
  • any of the characters: <>[]*;=,

This restriction is enforced by all tools that require a fasta reference genome. The MiModD.sanitize tool can be used to substitute illegal characters in description lines and also ensures that sequence data lines are block-formatted.

Note

The character restriction exists because MiModD will use the full content of the description line as the sequence name and we must ensure that this name is a valid sequence name in all downstream data formats generated during any analysis.

See also

MiModD tools that use fasta input files

snap, snap-batch, snap-index, varcall

MiModD tools to manipulate fasta files

MiModD.sanitize


sam

See also

MiModD tools that accept sam input files

snap, snap-batch

MiModD tools that produce sam output files

snap, snap-batch, header,

MiModD tools to manipulate sam files

convert, reheader


bam

See also

MiModD tools that accept bam input files

snap, snap-batch, varcall, delcall, MiModD.index

MiModD tools that produce bam output files

snap, snap-batch

MiModD tools to manipulate bam files

convert, reheader, sort


vcf

See also

MiModD tools that use vcf input files

MiModD tools that produce vcf output files

varextract

MiModD tools to manipulate vcf files

vcf-filter


bcf

See also

MiModD tools that use bcf input files

MiModD tools that produce bcf output files

varcall

MiModD tools to manipulate bcf files