I have data in 50,000 X coverage and paired-end 100 bp reads. The genome size is expected to be around 12.5 Mb. I would like to run ABySS for assembly and to see how much the assembly is improved comparing to data in 200 X coverage. Do you have any suggestions to run ABySS with this data? Is it doable to use regular ABySS paired-end mode for obtaining the assembly?
I do not think you will get a better assembly; more likely, a worse assembly. Unless your coverage is very uneven, going over 100x or so typically starts to make the assembly worse, as there are an increasing number of exactly replicated sequencing errors, which create false branches in the deBruijn graph. With thousands of X coverage, people typically normalize or subsample in order to achieve a better assembly. Though it's possible that some metagenome, single-cell, or RNA-seq assemblers would be more tolerant of such high coverage.
For what it's worth, ABySS automatically calculates its kmer coverage threshold for filtering out error kmers based on the kmer coverage histogram, so in principle your data set should assemble fine. But you are probably not going to gain much from having all that extra coverage.
ADD REPLY
• link
updated 14 months ago by
Ram
43k
•
written 9.0 years ago by
benv
▴
730
In my opinion, this is good advice.
For what it's worth, ABySS automatically calculates its kmer coverage threshold for filtering out error kmers based on the kmer coverage histogram, so in principle your data set should assemble fine. But you are probably not going to gain much from having all that extra coverage.