Is There Any Reason To Do De Novo Transcript Assembly If A Reference Is Available?
5
9
Entering edit mode
10.3 years ago
Ryan Thompson ★ 3.6k

I am working on some RNA-Seq data from Cynomolgus monkey. I was originally planning to do de novo transcript assembly, but then I realized that the Cyno genome has recently been released, so I can do reference-guided transcript assembly instead. However, I am wondering if there is any compelling reason to do de novo transcriptome assembly as well as or instead of reference-guided.

(By the way, my data is Illumina 2x100 with a 250 bp insert size, in case that makes a difference.)

rna denovo trinity cufflinks • 6.2k views
ADD COMMENT
8
Entering edit mode
10.3 years ago
apa@stowers ▴ 580

In the paper for Trinity, which appears to be the current best-of-breed de novo transcriptome assembler, they compare their results to the two best-known reference-guided assemblers, Cufflinks and Scripture. In mouse, ref-guided was better (recovered more full-length genes + isoforms). In pombe, it was worse.

I don't remember why ref-guided was worse for pombe, or if it was even discussed, but I think it relates to the high density / low structural complexity of pombe genes -- Cufflinks and Scripture were designed with vertebrate transcriptomes in mind.

So if you're working in vertebrate, ref-guided is probably the best option.

ADD COMMENT
0
Entering edit mode

Hey there, Ariel, long time no chat :)

ADD REPLY
4
Entering edit mode
10.3 years ago

No, I wouldn't say there is. Of course, it depends on what you mean by "having a reference". It is still necessary to try to de novo assemble highly variable regions such as the HLA region in the human genome, or poorly covered regions. For your case, I would prefer a reference guided assembly.

ADD COMMENT
4
Entering edit mode
10.3 years ago
Ahdf-Lell-Kocks ★ 1.6k

Considering the quality of your reference assembly, it may be worth to do both: ref-based alignment of the RNA-seq reads and at the same time de novo assembly. The de novo assembly will pick up a number of transcripts that cannot be reliably mapped to the reference due to gaps or miss-assembled regions, and complement the transcripts found on the ref-based set.

ADD COMMENT
2
Entering edit mode
10.3 years ago
Darked89 4.2k

I guess none of the monkey genomes achieved the level of completion comparable to the human genome. But even in the best case scenario: human RNA-Seq + human genome there was a recent article showing that you can get novel transcripts. I will provide link later on.

Also you may get stuff which is not in the genome, like viruses infecting your sample.

So the answer would be: do both (guided and RNA-Seq assembly), probably always, maybe except some tiny, multiple times sequenced genomes.

ADD COMMENT
2
Entering edit mode
10.3 years ago
Geparada ★ 1.4k

If you are going to do a reference guided transcriptome assembly, you have to keep in mind that all the assembly errors of the reference genome will be reflected in your results. So, you should view the assembly statistics of this genome (like N50, number of scaffolds, etc) before take a decision. But otherwise, if you have well assembled genomes like hg19 or mm9, the guided way is the best option.

ADD COMMENT

Login before adding your answer.

Traffic: 1690 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6