Jacob Windsor



MMRES student at CRG and DCEXS
Class 2019-2020

After MMRES:

Software Engineer at European Bioinformatics Institute

TMERGE2 - A Bioinformatics Workflow for High-Throughput Human Transcriptome Annotation with Third Generation Sequencing


Third generation sequencing (TGS) technologies can potentially be used to create highly accurate and automated gene and transcript annotation pipelines. TGS allows for the sequencing of full-length transcripts at an unprecedented throughput, but high template mismatch, high indel rates, and partial transcript coverage mean that computational tools to remove artefacts and extract full-length genome alignments are required. Here, we present tmerge2, a tool to accurately produce full-length transcript models from TGS datasets using a non-heuristic approach that favours sensitivity and precision. We have shown that tmerge2 produces transcriptomics datasets with a much higher precision that other available tools. Furthermore, tmerge2 implements a unique plugin system allowing it to be tailored to user-specific needs and also uses a novel machine learning based classifier to identify and remove artefactual isoforms.

Major project supervisor

Roderic Guigó

Minor project supervisor

Tomàs Marqués-Bonet