Question: How to properly assemble the virus genome using map to reference strategy
gravatar for xatabadich
8 weeks ago by
xatabadich0 wrote:

Hello everyone,

I am trying to assemble a virus sars-cov2 in consensus using map to reference strategy. The problem is that the reference genome is approx. 30.000 in length and my assembly is only 1500 bp. While map to reference method using bwa mem software I got 93% mapped reads, so I am expecting almost circular genome.

I used the following commands:

1) Indexing file - bwa index reference\ /SARS_cov_2.fasta

2) Allign and convert to bam - bwa mem reference\ /SARS_cov_2.fasta ERR4082713_1.fastq.gz ERR4082713_2.fastq.gz | samtools sort -o aln_sars_cov.bam

3) Keep all mapped sequences - samtools view -c -f 4 aln_sars_cov.bam

4) Retrieve reads from bam file - bam2fastq --aligned --force --strict -o mapped#.fq aln_sars_cov.bam

5) Assemble using SPAdes - -k 127 -1 mapped_1.fq -2 mapped_2.fq --careful -o output/

Am I using the wrong method to assemble it or is it a problem with pre-processing the reads ? What is the best strategy (pipeline) to get a consensus?

ADD COMMENTlink modified 5 weeks ago by Biostar ♦♦ 20 • written 8 weeks ago by xatabadich0

Why would you need to assemble? Just align to reference and extract consensus. See e.g. Generating consensus sequence from bam file

ADD REPLYlink written 8 weeks ago by 5heikki8.9k

I will try this method, thanks a lot!

ADD REPLYlink written 8 weeks ago by xatabadich0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1153 users visited in the last hour