Tutorial: Generating consensus sequence from bam file
gravatar for finswimmer
6 weeks ago by
finswimmer11k wrote:

One of the recurring questions on biostars is "How can I create a consensus sequence from my bam file?" A variation of these question is "How to get fasta out of bam file?".

The rational behind this question is usually, to get the sequence for a given region of interest of my sample, so that one can do a multiple alignment between different samples/species.

The answer to this question is a simple two-step workflow:

  1. Call variants
  2. Include the variants into the reference sequence

Do the variant calling step with your favorite program/workflow, e.g. with bcftools :

$ bcftools mpileup -Ou -f ref.fa input.bam | bcftools call -Oz -mv -o output.vcf.gz

In the end, one needs a valid, sorted vcf file which is compressed with bgzip and indexed with tabix.

If your vcf isn't already compressed you can do this with:

$ bgzip -c output.vcf > output.vcf.gz

And indexing is done by:

$ tabix output.vcf.gz

Now we are ready to create our consensus sequence. The tool of choice is bcftools consensus.

If one like to create a complete new reference genome, based on the called variants, this is done by:

$ bcftools consensus -f ref.fa output.vcf.gz > out.fa

If you are interested in only a given region:

$ samtools faidx ref.fa 8:11870-11890 | bcftools consensus output.vcf.gz -o out.fa

You can also create consensus sequences for multiple regions, if you provide a bed file to samtools faidx:

$ samtools faidx -r regions.bed ref.fa  | bcftools consensus output.vcf.gz -o out.fa

Have fun :)

fin swimmer

bam tutorial consensus fasta • 223 views
ADD COMMENTlink modified 6 weeks ago by WouterDeCoster38k • written 6 weeks ago by finswimmer11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 841 users visited in the last hour