viral genome assembly
1
0
Entering edit mode
4.4 years ago
Dr.Animo ▴ 130

Dear all, I am trying to assemble a phage (virus) genome. I've checked the quality of the reads and mapped them to the host genome to remove the host genome reads. Now I am trying to assemble the unmapped reads but the number of contigs is more than 1000. I have tried different kmer sizes and different assembler but the number of contigs is always more than 1000. I've found that there are chimeric reads due to which the number of contigs is very large. My question is how to find out these chimeric reads and remove them?

assembly virus genome chimeric reads • 2.7k views
ADD COMMENT
0
Entering edit mode

It is possible that you have way more coverage (the genome must be pretty small) than necessary. You could look into normalizing the data to a lower coverage and/or use tadpole.sh from BBMap suite as a k-mer based assembler instead.

ADD REPLY
0
Entering edit mode

Tadpole seems to do a better job of assembling viruses than Spades. I won't guarantee that, but it seems to be generally true.

Viruses and hosts can share sequence. If you remove all sequences shared between the virus and host, it's likely that you will incur holes in your assembly, if you are trying to assemble the host.

In this case, I'd suggest partitioning reads by depth, and assembling the high-depth reads, which will be viral. You can do that with Tadpole by using the mindepth=X flag.

ADD REPLY
0
Entering edit mode

I've already assembled the reads with tadpole, the number of contigs decreased but they are very high. mindepth=X What is X here?

ADD REPLY
0
Entering edit mode

Brian Bushnell : Is mindepth= flag new since I don't see it in the in-line help for tadpole.sh.

ADD REPLY
0
Entering edit mode

I could not find any major difference by applying mindepth

ADD REPLY
0
Entering edit mode

What value did you use? 100 or more?

ADD REPLY
0
Entering edit mode

What about V-GAP assembly pipeline? https://www.sciencedirect.com/science/article/pii/S0378111915012378

I could not find the pipeline that proposes the author in the article. If someone has this then please share it.

ADD REPLY
3
Entering edit mode
4.4 years ago
Mensur Dlakic ★ 27k

1) Have you compared the contigs to host genome? Just because you removed the reads by mapping does not mean that all of host DNA is gone.

2) Have you tried assembling without removing any reads? You may be unintentionally removing some viral reads, and host contigs can always be removed afterwards.

3) Consider using virus-specific assemblers that account for their higher mutation rates.

https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-475

https://academic.oup.com/bib/article/20/1/15/4055921

https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0626-5

ADD COMMENT

Login before adding your answer.

Traffic: 1412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6