Question

combine or split, that's a question for transcriptome assembly

0

Entering edit mode

7.2 years ago

dukecomeback ▴ 40

I combined 10 libraries before doing tophat+cufflinks and trinity+PASA assembling, end up with each process gave me hundred thousands of predicted transcripts. I believe there must be a lot of false positive. But would it be better if I run with each library respectively, then use some tools like cuffmerge to merge the result assembly? Does anyone has experience comparing these? I would be really grateful for your sharing.

    Sincerely,
         Kang

RNA-Seq Assembly • 2.0k views

ADD COMMENT • link updated 7.2 years ago by Rob 6.8k • written 7.2 years ago by dukecomeback ▴ 40

0

Entering edit mode

Since you are doing de novo assembly using trinity I assume you don't have a (decent) reference genome available? But still, you are using tophat? (Which is, as said by Rob, deprecated.). Please be as complete as possible when asking questions, information such as the organism you are working on is important.

ADD REPLY • link 7.2 years ago by WouterDeCoster 47k

0

Entering edit mode

I'm actually trying to build a genome annotation pipeline here. I hope to extract the overlap part from the two process to get some high quality genes.

ADD REPLY • link 7.2 years ago by dukecomeback ▴ 40

score 1 · Answer 1 · 2017-05-29

1

Entering edit mode

7.2 years ago

Rob 6.8k

By combining the samples prior to assembly, you increase the likelihood of generating (computationally) chimeric transcripts. You might try to assembly separately and then combine the assemblies using e.g. TACO. Also, TopHat has been deprecated by the developers. For reference-based assembly, you might try HISAT + StringTe instead of TopHat + Cufflinks.

ADD COMMENT • link 7.2 years ago by Rob 6.8k

0

Entering edit mode

Thank you so much for your sharing, I really didn't know the whole TopHat being deprecated thing.

ADD REPLY • link 7.2 years ago by dukecomeback ▴ 40

0

Entering edit mode

See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using kallisto or salmon.