How to properly assemble the virus genome using map to reference strategy
0
0
Entering edit mode
11 months ago
xatabadich • 0

Hello everyone,

I am trying to assemble a virus sars-cov2 in consensus using map to reference strategy. The problem is that the reference genome is approx. 30.000 in length and my assembly is only 1500 bp. While map to reference method using bwa mem software I got 93% mapped reads, so I am expecting almost circular genome.

I used the following commands:

1) Indexing file - bwa index reference\ /SARS_cov_2.fasta

2) Allign and convert to bam - bwa mem reference\ /SARS_cov_2.fasta ERR4082713_1.fastq.gz ERR4082713_2.fastq.gz | samtools sort -o aln_sars_cov.bam

3) Keep all mapped sequences - samtools view -c -f 4 aln_sars_cov.bam

4) Retrieve reads from bam file - bam2fastq --aligned --force --strict -o mapped#.fq aln_sars_cov.bam

5) Assemble using SPAdes - spades.py -k 127 -1 mapped_1.fq -2 mapped_2.fq --careful -o output/

Am I using the wrong method to assemble it or is it a problem with pre-processing the reads ? What is the best strategy (pipeline) to get a consensus?

genome Assembly map to reference rna-seq • 254 views
ADD COMMENT
2
Entering edit mode

Why would you need to assemble? Just align to reference and extract consensus. See e.g. Generating consensus sequence from bam file

ADD REPLY
0
Entering edit mode

I will try this method, thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 1711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6