Workflow

For the user

  • The user provides the parameters (software paths, model settings, ...)
  • The user provides a list of genes and mutations
  • MoNvIso searches UniProt's databases for the canonical sequence and other isoforms
  • In the output path, a directory is created for every gene
  • In every gene's directory, a sub-directory is created for every isoform
  • The PDB databank is queried for every sequence and returns a list of similar sequences
  • HMMER is used to combine the results and start a new search for templates
  • The PDB files of the templates are downloaded, and the template chain is extracted
  • The sequence of the extracted chains is aligned with the sequence to model
  • Chain breaks are introduced in place of large gaps
  • The number of mutations that are mappable and the value of sequence identity are measured
  • Isoforms are ranked based on their score (ability to model the mutated protein)
  • The aligned sequence is passed to Modeller for modelling

Under the hood

The MoNvIso run is coordinated by an instance of the Run class.

  • The Run.load_mutation_list() method passes the mutation file to the Input parser, which returns the mutations in a list type
  • The Run.load_input() method takes the input parameters and passes them to the Input_parser file, which returns them processed as a dictionary
  • The Run.create_genes() method instances the Gene object, one object per gene to model. These genes are stored in a list as a local attribute of the Run.
  • The Run.create_isoform() method loads the UniProt databases via a Database_Parser object. The database returns one sequence per isoform per gene. Each is used to instance an Isoform object. Every isoform object is stored in a list as an attribute of the corresponding gene.
  • At this point, the Gene object and Isoform objects take care of creating their corresponding directories and .fasta files with the alignments
  • The Run object coordinates the creation of alignments with the Run.run_blastp(), Run.run_cobalt(), Run.run_hmmsearch() methods. Each isoform of each gene runs a blastp search, saves the output, creates a .hmm with HMMER, and uses it for a search with hmmsearch. The output contains the PDB IDs and chains of the best templates
  • Run.load_templates(): For every proposed template, a Template instance is created, which takes care of downloading and extracting the chain and fasta sequence from the PDB file. The operations on the PDB file are managed by the PDB_manager class. All the sequences of the templates are aligned with Cobalt.
  • Run.select_isoform(): For every isoform, the mutation score (How many mutations can be mapped on the sequence), structural score (sequence identity), and the Selection score (the first two combined) are calculated. The isoforms are ranked, and the ones with higher scores are preferred to model the mutation.
  • Run.start_modeller(): The isoforms that can be modeled will create the Modeller input file and start a Modeller run.