Question

Comparing assembled transcripts against a reference

3

Entering edit mode

9.4 years ago

Rob 6.8k

Hi all,

I'm interested in comparing the assemblies produced by Bayesembler and StringTie against a reference annotation using a few different metrics. In particular, I have reference alignments of a few RNA-seq samples in human (against hg19) and I'd like to see how many of the transcripts recovered by the different programs match known annotations well etc. Obviously, the different tools asses metrics like this in their respective manuscripts, but not on the same data set and not against each other. Is there a common / standard tool I can use to compare the different GTF files to get measurements like sensitivity and specificity without having to e.g. write my own evaluation pipeline? The biggest issue seems to be calling "accurate" transcripts that have only minor differences from the reference (e.g. a few bases off at the TSS or transcription termination site). For my purposes, I'd like such transcripts to be considered correct if they otherwise match the annotation. Any suggestions?

Assembly RNA-Seq • 3.6k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.4 years ago by Rob 6.8k

0

Entering edit mode

How did it turn out?

ADD REPLY • link 8.9 years ago by IV ★ 1.3k

Ram · Answer 1 · 2015-03-11

Hi Rob,

As far as I understand your question:- here are few metrics you can use for the evaluation of your new assembled transcripts:-

Align your new assembled transcripts with the already existing reference genome and existing coding sequences. You can use Blast/Blat for this purpose and using simple linux commands you can see how many are not align to the genome/coding sequences.
Conduct the full length analysis of your new assembled transcripts. You can use the scripts implemented in Trinity.
As far as the accuracy of the new transcripts concern - For this you have to collect the evidence at each step e.g after conducting step one, convert the output in GTF/GFF and made a comparison with the existing annotation GTF/GFF/BED file in IGB/IGV. IF you see some minor differences may be the extended UTR or alternate spliced transcripts, then see the read depth in the input read sample. You can conduct all these post assembly analysis step on your assembly

I hope this will answer your query.

Reema