Question

de novo transcriptome

0

Entering edit mode

3.9 years ago

valopes ▴ 30

Hi all,

I have 8 RNA-seq samples from the same organism in different conditions. After trimming, filtering quality, filtering possible contaminant species, and rRNA sequences, and et cetera, I've merged this high-quality reads and got a file with 98 million reads. From this, I've run Trinity and got ~4 million contigs. I found it a very high number! Anyway, let's go ahead with my question. I've found some contigs with ~200kb. When I check it for genes I can find something like 20-25 predicted complete genes. I was not expecting this. I mean, I don't know, is it something good or bad?

Thank!

assembly rna-seq • 739 views

ADD COMMENT • link updated 3.9 years ago by colindaven 6.3k • written 3.9 years ago by valopes ▴ 30

0

Entering edit mode

Are you working with a bacteria of virus? yes, that is expected as Operons. If not you maybe are creating chimeras in the assembly, how long are your reads? did you have paired-ends? which seq tech?

ADD REPLY • link 3.9 years ago by JC 13k

0

Entering edit mode

Okay! Sorry I miss a lot of info... It is Eukaryote, so not expecting operons. It is 2x150 pb Illumina.

ADD REPLY • link 3.9 years ago by valopes ▴ 30

score 0 · Answer 1 · 2020-05-14

0

Entering edit mode

3.9 years ago

colindaven 6.3k

4m transcripts is a lot. I guess they are very highly redundant.

Filter - exclude very short ones.

You can reduce redundancy using a fasta clustering tool, eg cd-hit.

Gmap using GFF3 output is excellent for mapping to the genome and visualization.

Once you're checking the alignments visually you'll know how decent the quality is.