Summary
Third generation sequencing (TGS) technologies can potentially be used to create highly accurate and automated gene and transcript annotation pipelines. TGS allows for the sequencing of full-length transcripts at an unprecedented throughput, but high template mismatch, high indel rates, and partial transcript coverage mean that computational tools to remove artefacts and extract full-length genome alignments are required. Here, we present tmerge2, a tool to accurately produce full-length transcript models from TGS datasets using a non-heuristic approach that favours sensitivity and precision. We have shown that tmerge2 produces transcriptomics datasets with a much higher precision that other available tools. Furthermore, tmerge2 implements a unique plugin system allowing it to be tailored to user-specific needs and also uses a novel machine learning based classifier to identify and remove artefactual isoforms.
Major project supervisor
Minor project supervisor
Institutional Members of the Board of Trustees
The Barcelona Institute of Science and Technology
Travessera de les Corts, 131-159
Pavelló Central. Recinte Maternitat.
08028 Barcelona
T. +34 609 853 113
info@bist.eu