Question: Refining My Transcriptome Assembly
1
gravatar for Craig Anderson
8.0 years ago by
Craig Anderson10 wrote:

Hi,

I'm doing some transcriptomics on a non-model (an earthworm) and am having issues with my assembly.

I've got HiSeq RNAseq data from pooled samples (around 20-25 monophyletic) individuals for each of 3 exposures. I've assembled the transcriptome of a single exposure group using Velvet and Oases, but I've got a massive haul, with an N50 of 82,694 >= 1465 bp.

I anticipate that vast amount of variation within my sample will mean that theres an awful lot of very similar sequences in my data- What software is out there to help me achieve a consensus transcriptome?

I really would appreciate any pointers,

Craig

P.S. There is a draft reference genome for this species, but its of a genetically distinct (14% according to mitochondrial COII markers and AFLP) alternative lineage.

Edit: Because I've pooled so many individuals, I'd like to reduce the number of contigs that occur as individual sequences due to SNPs, sequencing errors or whatever.

I'm aware that I need to redo the assembly to get rid of sequences that velvetg has attempted to scaffold with Ns. All other parameters other than kmer length and insert length are at default values.

Hope that helps!

assembly rna transcriptome • 2.4k views
ADD COMMENTlink modified 8.0 years ago by Anna40 • written 8.0 years ago by Craig Anderson10

Just for clarification: You have an L50 of 14694? Thats is pretty huge. I am not getting what you are asking. An assembly is a consensus sequence.

ADD REPLYlink written 8.0 years ago by Fabian Bull1.3k

It is not entirely clear what are you after - are you asking about advice on achieving a better assembly?

ADD REPLYlink written 8.0 years ago by Istvan Albert ♦♦ 81k
2
gravatar for Anna
7.9 years ago by
Anna40
Anna40 wrote:

hi Craig,

there are several ways or reducing redundancy.

for example, if you have a draft assembly you can use the reads mapped to individual contigs to reduce the possible reads that velvet/oases uses. You'd be running one velvet-oases for each contig using ONLY the reads mapping to that contig. That would also make Oases run with less memory and much quicker. Anothe tip, avoid pooling samples if you can. That worked very well for me, and I'm also work in worms!

another approach would be to use some software such as Jigsaw

http://www.cbcb.umd.edu/software/jigsaw/

or any other consensus caller - loads of all EST paper would have lists of them.

hope this helps

Anna

ADD COMMENTlink written 7.9 years ago by Anna40

awesome, thanks Anna

ADD REPLYlink written 7.9 years ago by Craig Anderson10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1911 users visited in the last hour