Question: Transcriptome assembly: Low GC, short contigs, low read alignment
0
gravatar for jmah
2.1 years ago by
jmah10
jmah10 wrote:

Hi!

I am trying to troubleshoot two de novo Trinity assemblies. They were sequenced during the same run for two species of sponge, and I obtained 2x150 bp reads to a depth of 124x. We already have a whole transcriptome for each species assembled, but for our purposes I would like a de novo assembly. The GC content of my new assemblies are 3-7% lower than our old assemblies. Furthermore, my assemblies have many short contigs (ie. N50: 800 bp, cf. to 1800 bp of the old assemblies, median length: 300 vs 800 bp, mean length: 600 vs 1200 bp). The nail on the coffin is that there are few reads aligned in proper paired orientation when mapped back to my de novo assemblies: ~50% in proper pairs.

I am most worried about the GC content. GC content of the reads are similar to our old transcriptomes and only lower after assembly. I have changed adapter trimming parameters and tried out the jaccard clip setting for Trinity, but my assembly stats remain almost identical each run.

Has anyone received assemblies with low GC and short contigs before? If so, what did you do to fix that?

Thanks! If there's any more information that can prove helpful, please let me know.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by jmah10

If you're using Trinity, with that much depth, you might want to use the in silico read normalization parameter. Also, why assemble de novo instead of reference based, if you have other assemblies? if you're looking for DEGs, combine all avalable data to create a single assembly, then align your samples back to the assembly to get abundance estimates.

ADD REPLYlink written 2.1 years ago by st.ph.n2.4k
0
gravatar for jmah
2.1 years ago by
jmah10
jmah10 wrote:

Hi st,ph.n,

Thanks for you advice! I did use normalization, and yes my goal is to create a reference assembly for DE. I would rather not use a reference, because the goal is to find new genes and uncharacterized sequences. Any ideas about the low GC content?

Thanks!

ADD COMMENTlink written 2.1 years ago by jmah10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 757 users visited in the last hour