Question: viral genome assembly
0
gravatar for Dr.Animo
14 months ago by
Dr.Animo10
Dr.Animo10 wrote:

Dear all, I am trying to assemble a phage (virus) genome. I've checked the quality of the reads and mapped them to the host genome to remove the host genome reads. Now I am trying to assemble the unmapped reads but the number of contigs is more than 1000. I have tried different kmer sizes and different assembler but the number of contigs is always more than 1000. I've found that there are chimeric reads due to which the number of contigs is very large. My question is how to find out these chimeric reads and remove them?

ADD COMMENTlink modified 14 months ago by Mensur Dlakic8.4k • written 14 months ago by Dr.Animo10

It is possible that you have way more coverage (the genome must be pretty small) than necessary. You could look into normalizing the data to a lower coverage and/or use tadpole.sh from BBMap suite as a k-mer based assembler instead.

ADD REPLYlink written 14 months ago by GenoMax95k

Tadpole seems to do a better job of assembling viruses than Spades. I won't guarantee that, but it seems to be generally true.

Viruses and hosts can share sequence. If you remove all sequences shared between the virus and host, it's likely that you will incur holes in your assembly, if you are trying to assemble the host.

In this case, I'd suggest partitioning reads by depth, and assembling the high-depth reads, which will be viral. You can do that with Tadpole by using the mindepth=X flag.

ADD REPLYlink modified 14 months ago • written 14 months ago by Brian Bushnell17k

I've already assembled the reads with tadpole, the number of contigs decreased but they are very high. mindepth=X What is X here?

ADD REPLYlink written 14 months ago by Dr.Animo10

Brian Bushnell : Is mindepth= flag new since I don't see it in the in-line help for tadpole.sh.

ADD REPLYlink written 13 months ago by GenoMax95k

I could not find any major difference by applying mindepth

ADD REPLYlink modified 13 months ago • written 13 months ago by Dr.Animo10

What value did you use? 100 or more?

ADD REPLYlink written 13 months ago by GenoMax95k

What about V-GAP assembly pipeline? https://www.sciencedirect.com/science/article/pii/S0378111915012378

I could not find the pipeline that proposes the author in the article. If someone has this then please share it.

ADD REPLYlink written 13 months ago by Dr.Animo10
3
gravatar for Mensur Dlakic
14 months ago by
Mensur Dlakic8.4k
USA
Mensur Dlakic8.4k wrote:

1) Have you compared the contigs to host genome? Just because you removed the reads by mapping does not mean that all of host DNA is gone.

2) Have you tried assembling without removing any reads? You may be unintentionally removing some viral reads, and host contigs can always be removed afterwards.

3) Consider using virus-specific assemblers that account for their higher mutation rates.

https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-475

https://academic.oup.com/bib/article/20/1/15/4055921

https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0626-5

ADD COMMENTlink written 14 months ago by Mensur Dlakic8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2634 users visited in the last hour
_