BIST-Dolors-Aleu-Graduate-Centre
Jacob Windsor

Jacob

Windsor

MMRES student at CRG and DCEXS
Class 2019-2020

After MMRES:

Software Engineer at European Bioinformatics Institute

TMERGE2 - A Bioinformatics Workflow for High-Throughput Human Transcriptome Annotation with Third Generation Sequencing

Summary

Third generation sequencing (TGS) technologies can potentially be used to create highly accurate and automated gene and transcript annotation pipelines. TGS allows for the sequencing of full-length transcripts at an unprecedented throughput, but high template mismatch, high indel rates, and partial transcript coverage mean that computational tools to remove artefacts and extract full-length genome alignments are required. Here, we present tmerge2, a tool to accurately produce full-length transcript models from TGS datasets using a non-heuristic approach that favours sensitivity and precision. We have shown that tmerge2 produces transcriptomics datasets with a much higher precision that other available tools. Furthermore, tmerge2 implements a unique plugin system allowing it to be tailored to user-specific needs and also uses a novel machine learning based classifier to identify and remove artefactual isoforms.

Major project supervisor

Roderic Guigó
CRG

Minor project supervisor

Tomàs Marqués-Bonet
MELIS-UPF