Question: Does It Make Sense To Merge Oases Multiple-Kmer Assemblies
gravatar for Lhl
7.9 years ago by
United States
Lhl730 wrote:

Hi All,

I am using OASES to do transcriptome de novo assembly. I tried multiple kmer values and were able to choose the best one based on N50, larges transcripts and so on. how ever, I am wondering if it makes sense to merge assemblies produced with different kmers.

another question would be dealing with the un-used reads, should be also merge them with multiple transcripts.fa ?

Will be happy if anyone give me a hint. Thanks in advance.



assembly • 4.4k views
ADD COMMENTlink modified 7.9 years ago by Jeremy Leipzig18k • written 7.9 years ago by Lhl730
gravatar for Jeremy Leipzig
7.9 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

Oases has its own merge function now.

There is certainly no risk in merging with assemblies generated with a higher kmer, since they are more conservative. Assemblies generated with lower kmers may introduce false contigs. You definitely don't want to actually extend those further, which is why I am somewhat leery of assembling assemblies with programs like CAP3.

ADD COMMENTlink written 7.9 years ago by Jeremy Leipzig18k

Thanks very much Jeremy. Your suggestion is very helpful. I found that the higher kmers produce very fewer and shorter contigs/transcripts. In my case, i used kmer from 19 to 79. However, which kmer value should i start to merge into the final assembly. should i start from the kmer (27), which yields best assembly, (in terms of N50, total number of bases in contigs, number of contigs longer than 500) and include all kmers larger than this value? I am new to NGS assembly. I would be very happy if you can let me know how can i find erroneous assembly (e.g. transcriptome assembly).

ADD REPLYlink written 7.9 years ago by Lhl730

Should i try to improve sequencing depth, length or anything else? Kind Regards

ADD REPLYlink written 7.9 years ago by Lhl730

Also, when you said you are 'somewhat leery of assembling assemblies with programs like CAP3', do you mean you prefer other assemblers? Or you are simply cautious about using the merging-multiple-assemblies strategy?

ADD REPLYlink written 7.9 years ago by Lhl730

Normally a higher kmer results in smaller assemblies with hopefully a few longer contigs - it's those longer ones you are after. Obviously as your kmer approaches read length this will break down.

CAP3 is a greedy assembler. If you have contigs that did not assemble in a debruijn assembler there was probably a reason (i.e. ambiguity). If you throw them into CAP3 and they assemble you should be cautious, it might be making some risky decisions.

As far as the metrics for judging your assemblies you are really in the best position to compare. I would try to blasting against similar organisms.

ADD REPLYlink written 7.9 years ago by Jeremy Leipzig18k

I got what you mean, thanks very much Jeremy.

ADD REPLYlink written 7.9 years ago by Lhl730

Hi Jeremy,

I know this post is long dated, but I still have some question which would be great to have your suggestion.

I have de novo RNAseq data (Illumina Hiseq) from multiple organs of a single animal. I have tried Velvet/Oases with multiples kmer for data from each organ, which finally yielded (as Oases -merge output) a transcript.fasta file for each organ. Now I would like to make a whole reference transcriptome of this animal, how can I merge transcripts obtained from multiple organs? or should I break down those transcripts.fa to kmer and reassemble them? At the moment, which is the best suited tool for my purpose? Thank you in advance!


ADD REPLYlink written 4.2 years ago by pbigbig200

This probably deserves its own new biostars question.

ADD REPLYlink written 4.2 years ago by Jeremy Leipzig18k

ok, I will post a new one

Thank you

ADD REPLYlink written 4.1 years ago by pbigbig200
gravatar for Erick Cardenas
7.9 years ago by
Erick Cardenas30 wrote:

There are programs that indeed merge assemblies done with different k values. One I know is minimus from the Amos pacakge. See a tutorial here

Since the optimal kmer value is a function of the coverage. I assume that genes with different transcriptions levels will have different optimal k. I have not worked with RNA seq myself but have heard that some kind of normalization is recommended to improve the assembly since most of the graph assembler will use the median kmer coverage to select which paths are erroneous. So they will assume that reads with low-frequency coverage are bad even though they could be just genes with low expression.

ADD COMMENTlink written 7.9 years ago by Erick Cardenas30

thanks Eric. Most of my transcriptome data was normalized!! a small proportion is from RNAseq sequencing. Thanks for your suggestion. At this moment i am trying using CAP3 to further hybridize transcripts produced by OASES. But i will also give minimus a go.

ADD REPLYlink written 7.9 years ago by Lhl730
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1491 users visited in the last hour