Does It Make Sense To Merge Oases Multiple-Kmer Assemblies
2
2
Entering edit mode
12.2 years ago
Lhl ▴ 760

Hi All,

I am using OASES to do transcriptome de novo assembly. I tried multiple kmer values and were able to choose the best one based on N50, larges transcripts and so on. how ever, I am wondering if it makes sense to merge assemblies produced with different kmers.

another question would be dealing with the un-used reads, should be also merge them with multiple transcripts.fa ?

Will be happy if anyone give me a hint. Thanks in advance.

Regards,

lhl

assembly • 5.9k views
ADD COMMENT
6
Entering edit mode
12.2 years ago

Oases has its own merge function now.

There is certainly no risk in merging with assemblies generated with a higher kmer, since they are more conservative. Assemblies generated with lower kmers may introduce false contigs. You definitely don't want to actually extend those further, which is why I am somewhat leery of assembling assemblies with programs like CAP3.

ADD COMMENT
0
Entering edit mode

Thanks very much Jeremy. Your suggestion is very helpful. I found that the higher kmers produce very fewer and shorter contigs/transcripts. In my case, i used kmer from 19 to 79. However, which kmer value should i start to merge into the final assembly. should i start from the kmer (27), which yields best assembly, (in terms of N50, total number of bases in contigs, number of contigs longer than 500) and include all kmers larger than this value? I am new to NGS assembly. I would be very happy if you can let me know how can i find erroneous assembly (e.g. transcriptome assembly).

ADD REPLY
0
Entering edit mode

Should i try to improve sequencing depth, length or anything else? Kind Regards

ADD REPLY
0
Entering edit mode

Also, when you said you are 'somewhat leery of assembling assemblies with programs like CAP3', do you mean you prefer other assemblers? Or you are simply cautious about using the merging-multiple-assemblies strategy?

ADD REPLY
0
Entering edit mode

Normally a higher kmer results in smaller assemblies with hopefully a few longer contigs - it's those longer ones you are after. Obviously as your kmer approaches read length this will break down.

CAP3 is a greedy assembler. If you have contigs that did not assemble in a debruijn assembler there was probably a reason (i.e. ambiguity). If you throw them into CAP3 and they assemble you should be cautious, it might be making some risky decisions.

As far as the metrics for judging your assemblies you are really in the best position to compare. I would try to blasting against similar organisms.

ADD REPLY
0
Entering edit mode

I got what you mean, thanks very much Jeremy.

ADD REPLY
0
Entering edit mode

Hi Jeremy,

I know this post is long dated, but I still have some question which would be great to have your suggestion.

I have de novo RNAseq data (Illumina Hiseq) from multiple organs of a single animal. I have tried Velvet/Oases with multiples kmer for data from each organ, which finally yielded (as Oases -merge output) a transcript.fasta file for each organ. Now I would like to make a whole reference transcriptome of this animal, how can I merge transcripts obtained from multiple organs? or should I break down those transcripts.fa to kmer and reassemble them? At the moment, which is the best suited tool for my purpose? Thank you in advance!

Phuong.

ADD REPLY
0
Entering edit mode

This probably deserves its own new biostars question.

ADD REPLY
0
Entering edit mode

ok, I will post a new one

Thank you

ADD REPLY
3
Entering edit mode
12.2 years ago

There are programs that indeed merge assemblies done with different k values. One I know is minimus from the Amos pacakge. See a tutorial here

Since the optimal kmer value is a function of the coverage. I assume that genes with different transcriptions levels will have different optimal k. I have not worked with RNA seq myself but have heard that some kind of normalization is recommended to improve the assembly since most of the graph assembler will use the median kmer coverage to select which paths are erroneous. So they will assume that reads with low-frequency coverage are bad even though they could be just genes with low expression.

ADD COMMENT
0
Entering edit mode

thanks Eric. Most of my transcriptome data was normalized!! a small proportion is from RNAseq sequencing. Thanks for your suggestion. At this moment i am trying using CAP3 to further hybridize transcripts produced by OASES. But i will also give minimus a go.

ADD REPLY

Login before adding your answer.

Traffic: 2171 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6