Generating consensus sequence from bam file
6
1
Entering edit mode
2.3 years ago
chparada ▴ 10

Hi,

I am trying to generate consensus sequence from a bam file obtained after mapping SRA reads to a reference genome.

I used the following commands:

bwa mem ref.fasta SRR_1.fastq SRR_2.fastq > bwa.sam
samtools view -b -F 4 bwa.sam > bwa_aligned.bam
samtools index bwa_aligned.bam

I am not sure how to generate the consensus sequence that I have in mind. In case I don't explain this well. I made a diagram:

===========================================================>ref.fasta
- -- ---- ----      ----- --- --- -      --    ------ ----- 
------ --- ---     -------- --- --       ---- -      --- -->SRR_reads_mapping



==============  +  ================  +   ==================> consensus_sequence.fasta

Please let me know if you have any advice on this.

Cheers!!!

genome samtools bwa fasta • 9.2k views
ADD COMMENT
0
Entering edit mode

What do you mean by consensus sequence? The most frequent nucleotide in each position? The variants? There are plenty of tools that can give you the reading in each position, try bcftools mpileup for instance.

ADD REPLY
0
Entering edit mode

Thank you for your answers.

I meant obtaining the most frequent nucleotide in each position. From the mapped reads, I want to been able to obtain a consensus sequence in single fasta file. I am going to try bcftools mpileup.

ADD REPLY
0
Entering edit mode

see below my answer thanks

ADD REPLY
6
Entering edit mode
17 months ago
vsingh ▴ 60

1) Map short reads against reference gene sequence

# Create bowtie2 database
bowtie2-build REFERENCE.fasta REF_DB

# bowtie2 mapping
bowtie2 -x REF_DB -U SAMPLE.fastq --no-unal -S SAMPLE.sam

# samtools:  sort .sam file and convert to .bam file
samtools view -bS SAMPLE.sam | samtools sort - -o SAMPLE.bam

2) Get consensus sequence from .bam file

# Get consensus fastq file
samtools mpileup -uf REFERENCE.fasta SAMPLE.bam | bcftools call -c | vcfutils.pl vcf2fq > SAMPLE_cns.fastq

  # vcfutils.pl is part of bcftools

# Convert .fastq to .fasta and set bases of quality lower than 20 to N
seqtk seq -aQ64 -q20 -n N SAMPLE_cns.fastq > SAMPLE_cns.fasta
ADD COMMENT
3
Entering edit mode
2.3 years ago
nsmi8446 ▴ 130

You could call variants (using whatever variant calling software you like, GATK, freebayes etc.) from your .bam file and then use vcf-consensus (http://vcftools.sourceforge.net/perl_module.html#vcf-consensus) to build your consensus sequence. The code below should work:

cat ref.fa | vcf-consensus file.vcf.gz > out.fa

ADD COMMENT
1
Entering edit mode

I agree with nsmi8446. This is a nice concise way to solve your problem.

ADD REPLY
0
Entering edit mode
2.1 years ago
waldeyr • 0
samtools mpileup -uf my_reference.fna my_file.bam | bcftools view -cg - | vcfutils.pl vcf2fq > my_consensus.fq
ADD COMMENT
0
Entering edit mode
2.1 years ago
trausch ★ 1.6k

Alfred has a consensus mode that extracts all reads at a given alignment position and then runs a multiple sequence alignment computation with consensus generation. It's primarily for long reads but I think it also works for short reads.

alfred consensus -t ill -f bam -p chr4:500500 input.bam
ADD COMMENT
0
Entering edit mode
2.1 years ago
chen ★ 2.1k

try gencore: https://github.com/OpenGene/gencore

Generate consensus reads to reduce sequencing noises and remove duplications

ADD COMMENT
0
Entering edit mode
2.1 years ago
guillaume.rbt ▴ 880

you can use GATK FastaAlternateReferenceMaker to generate the consensus sequence based on a SNP calling : https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_fasta_FastaAlternateReferenceMaker.php

ADD COMMENT

Login before adding your answer.

Traffic: 2584 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6