Third generation sequencing (TGS) technologies can potentially be used to create highly accurate and automated gene and transcript annotation pipelines. TGS allows for the sequencing of full-length transcripts at an unprecedented throughput, but high template mismatch, high indel rates, and partial transcript coverage mean that computational tools to remove artefacts and extract full-length genome alignments are required. Here, we present tmerge2, a tool to accurately produce full-length transcript models from TGS datasets using a non-heuristic approach that favours sensitivity and precision. We have shown that tmerge2 produces transcriptomics datasets with a much higher precision that other available tools. Furthermore, tmerge2 implements a unique plugin system allowing it to be tailored to user-specific needs and also uses a novel machine learning based classifier to identify and remove artefactual isoforms.
Major project supervisor
Minor project supervisor