Python API

Warning

Documenting and stabilizing the Python API of MiModD is currently still work in progress.

Feel free to use any part of the package for your own coding purposes, but know that you will do so at your own risk.

If you want to contribute to MiModD or report bugs, contact us via the MiModD user group at https://groups.google.com/forum/#!forum/mimodd.

Description of the MiModD Python modules in alphabetical order

bioobj_base.py

Currently, only holds very few class definitions used by a few other modules.

cfg.py

This is MiModD’s configuration file defining global settings used by most other tools. The contents of this file should not be accessed directly, but through the wrapper config module.

cloudmap.py

The module is underlying the map command line tool. At the heart of the module is the NacreousMap class which analyzes the allele distributions of the variants found in its input. The map command line call itself maps to the delegate function, which performs argument validation, determines the analysis mode requested and calls vaf_mapping or svd_mapping, accordingly.

config.py

This module provides a class-based interface to MiModD’s configuration settings whether they come from the cfg.py configuration file, from the MIMODD_CONFIG_UPDATE or from Galaxy environment variables. Tools that need access to any configuration settings can simply import config.

convert.py

The module holds the code underlying the convert and reheader tools. fq2sam and sam2bam are the two functions performing the actual format conversions. fastqReader is a robust generator-based fastq parser and the ReHeader class implements the logic for modifying a SAM format header based on user-provided specifications.

decox.py

Provides low-level helper objects for working with decorators.

deletion_calling.py

The (still experimental) implementation of the delcall subcommand. The main function of the module is delcall which uses sample_insert_sizes to get an estimate of the distribution of insert sizes for the reads of a sample, find_lowcov_regions to identify regions of low-coverage and del_stats to assess whether there is statistical evidence of a deletion for any low-coverage region.

enablegalaxy.py

The code underlying the enablegalaxy tool.

fasta.py

The module provides a simple FastaReader class. The function get_fasta_info returns a list of { ‘SN’: seqtitle, ‘LN’: seqlen, ‘M5’: seqmd5sum } dictionaries for the sequences in a fasta file, which is analogous to the SQ records in a SAM format header and which is used by the fileinfo module for fasta format input.

fileinfo.py

The implementation of the info command line tool. The module defines a NGSInfo class and format-specific subclasses thereof, which provide a standardized interface to metadata extracted, e.g., through the module’s get_info function from files in BAM/SAM, vcf/bcf or fasta format. The print_fileinfo function pretty-prints NGSInfo subclasses in plain-text or html format.

index.py

Provides a small set of functions for manipulating and validating different types of index files. Provides the dispatch function for the index tool.

__init__.py

Provides general low-level objects, like the MiModD version identifier and Exception subclasses, for use by other modules.

iterx.py

Provides some low-level generator/iterator objects.

__main__.py

The mimodd command line parser.

plotters.py

Intended to be a collection of all code related to producing graphical output. Currently, provides the plotting functionality offered by the map tool.

pybcftools.py

Defines a bcfViewer class, and a view function to instantiate it, as a Python interface to the bcftools view command. This interface is used by the pyvcf module to treat bcf files the same way as vcf files.

pysamtools.py

A collection of wrappers that provide a functional programming interface for csamtools. Currently, MiModD uses a pysam core for iterator-based access to reads in SAM/BAM format, but all other subprocess-based calls to samtools are routed through this module. Most functions defined here have the same name as the corresponding samtools subcommand, but they typically provide extra functionality (e.g., support for additional file formats or error handling).

pyvcf.py

Provides an object-oriented interface for manipulating data in the vcf format. Used throughout the package to read and write vcf files.

samheader.py

Provides a Header class as an object representation of a SAM format header, which is used widely throughout the package. The module also defines the sam_header function, which underlies the header command line tool and which generates and returns a Header instance. The cat function defined here is a high-level wrapper around pysamtools.cat with additional header management, which is used by the snap module when merging aligned reads files in batch mode.

sanitize.py

An in-development module for sanitizing different file formats to ensure compatibility with MiModD. Currently accessible as the sanitize tool, its only implemented functionality is to sanitize fasta files.

snap.py

The package’s wrapper around the SNAP aligner. This module is used by the snap, snap-batch and index (for building snap reference genome indices) command line tools.

stats_base.py

Contains a Python implementation of Fisher’s exact test (right-sided version only) used by the deletion_calling module and a hexbin function used by the NacreousMap engine to pool data points in scatter plots.

tmpfiles.py

The functions defined here are used throughout the package to obtain unique names for temporary files and hardlinks during temporary file management and to ensure proper temporary data cleanup when tool runs get interrupted.

upgrade.py

This is the code underlying the upgrade tool.

variant_annotation.py

The implementation of the annotate tool. The main function of the module (called when the command line tool is run) is annotate, which is responsible for generating the annotated variant report. It uses the affected_genes function, which, in turn, relies on snpeff_effects, to parse a vcf file with annotations generated by SnpEff. The snpeff function provides a Python interface to command line calls of SnpEff for variant annotation and get_installed_snpeff_genomes is the function called by the snpeff-genomes command line tool. The remaining SnpEff-related functions in the module are lower-level functions used by snpeff and get_installed_snpeff_genomes to query the SnpEff installation for installed genome files.

variant_calling.py

The module provides a functional interface for variant calling, postprocessing of and generating summary statistics for variant call files. Its varcall function implements parallel variant calling using samtools/bcftools and the multiallelic calling model introduced with samtools version 1.0. The varextract function, underlying the command line tool with the same name, extracts variant sites from BCF input generated by varcall and reports them in vcf format. The function get_coverage_from_vcf generates the simple by-chromosome coverage report of the covstats tool.

vcf_filter.py

The module provides the implementation of the vcf-filter command line tool. Its single function filter parses its arguments to construct an appropriate call to the filter method of a pyvcf.VCFReader object, then writes out the pyvcf.VCFEntry objects returned by the method.