Refining My Transcriptome Assembly
1
1
Entering edit mode
9.6 years ago

Hi,

I'm doing some transcriptomics on a non-model (an earthworm) and am having issues with my assembly.

I've got HiSeq RNAseq data from pooled samples (around 20-25 monophyletic) individuals for each of 3 exposures. I've assembled the transcriptome of a single exposure group using Velvet and Oases, but I've got a massive haul, with an N50 of 82,694 >= 1465 bp.

I anticipate that vast amount of variation within my sample will mean that theres an awful lot of very similar sequences in my data- What software is out there to help me achieve a consensus transcriptome?

I really would appreciate any pointers,

Craig

P.S. There is a draft reference genome for this species, but its of a genetically distinct (14% according to mitochondrial COII markers and AFLP) alternative lineage.

Edit: Because I've pooled so many individuals, I'd like to reduce the number of contigs that occur as individual sequences due to SNPs, sequencing errors or whatever.

I'm aware that I need to redo the assembly to get rid of sequences that velvetg has attempted to scaffold with Ns. All other parameters other than kmer length and insert length are at default values.

Hope that helps!

transcriptome rna assembly • 2.7k views
ADD COMMENT
0
Entering edit mode

Just for clarification: You have an L50 of 14694? Thats is pretty huge. I am not getting what you are asking. An assembly is a consensus sequence.

ADD REPLY
0
Entering edit mode

It is not entirely clear what are you after - are you asking about advice on achieving a better assembly?

ADD REPLY
2
Entering edit mode
9.5 years ago
Anna ▴ 100

hi Craig,

there are several ways or reducing redundancy.

for example, if you have a draft assembly you can use the reads mapped to individual contigs to reduce the possible reads that velvet/oases uses. You'd be running one velvet-oases for each contig using ONLY the reads mapping to that contig. That would also make Oases run with less memory and much quicker. Anothe tip, avoid pooling samples if you can. That worked very well for me, and I'm also work in worms!

another approach would be to use some software such as Jigsaw

http://www.cbcb.umd.edu/software/jigsaw/

or any other consensus caller - loads of all EST paper would have lists of them.

hope this helps

Anna

ADD COMMENT
0
Entering edit mode

awesome, thanks Anna

ADD REPLY

Login before adding your answer.

Traffic: 1471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6