Question: Obtaining The Consensus Sequence From A Bam File In Fasta
gravatar for Anima Mundi
6.5 years ago by
Anima Mundi2.4k
Anima Mundi2.4k wrote:

Hello everybody,

I am dealing with BAM files for the first time, and I had some problems to handle them. My goal is to obtain the whole consensus sequence (between sequencing folds of a genomic range from the same sample) in FASTA format. My first approach has been to find a "ready to go" script/one line command (i. e. I had a look here Convert bam file to fasta file ) to make the conversion, but I failed. This is what I got trying to use Samtools:

[main_samview] fail to open "filename.bam" for reading.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "R_2:164876791-164886791.bam".

I tried to figure out how a BAM file is structured, but having a look with a text editor I saw that the only explicit information are the non-assembled reads, and it would take a while to obtain the consensus starting from there (and, I guess, there would be no point anymore to use a BAM file).

As a second choice, I tried to use some more graphically oriented tools, such Artemis. Until now, I had problems even to make one of this program work (but this are hopefully physiological issues which I can probably solve myself); furthermore, I did not find a way to extract a consensus from one of the samples pointed here

What would you suggest? Is there an easy way to obtain what I need, while I get more confident with the format? I hope the question is clear enough.

fasta consensus bam msa genome • 4.8k views
ADD COMMENTlink modified 5.8 years ago by Biostar ♦♦ 20 • written 6.5 years ago by Anima Mundi2.4k
gravatar for swbarnes2
6.5 years ago by
United States
swbarnes26.5k wrote:

The post you linked to is about someone doing something else. They aren't trying to make a single consensus file; they are trying to make a multi-fasta of every individual read.

There is no plug-and-play simple script that will take any .bam and make a perfect consensus fasta, because doing that requires making judgement calls about all the reads that don't perfectly match your consensus, of which there will be a lot.

But you can use samtools/bcftools to make an all-points vcf from your .bam, and then the vcf2fq command in vcfutils will try to make a consensus taking into account the called SNPs.

ADD COMMENTlink written 6.5 years ago by swbarnes26.5k
gravatar for Istvan Albert
6.5 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

You are using a file in the wrong format.

Make sure that you understand the difference between text formats like SAM and binary formats like BAM. Once you have a correct file the advice that you got can be applied directly.

ADD COMMENTlink written 6.5 years ago by Istvan Albert ♦♦ 81k

So, if I understood well, my data are currently in SAM format. In fact, using Editra to display them shows tab-delimited fields which may correspond to the description fields of each read. Also, using the file command gives me this output: "ASCII text, with very long lines", so I am not dealing with binary files. Still, I don't see any header line and, by the way, trying to convert my files to BAM (samtools view -b -S myfile.sorted.bam) leads to this error: "[samopen] no @SQ lines in the header. [sam_read1] missing header? Abort!".

ADD REPLYlink written 6.5 years ago by Anima Mundi2.4k

This was recently discussed right here How to extract unaligned sequences from BAM files obtainend from BWA

ADD REPLYlink written 6.5 years ago by Istvan Albert ♦♦ 81k

Ok, thank you. I think this particular issue is solved; still I have some troubles, I will open a new question in case I am not able to solve them out.

ADD REPLYlink written 6.5 years ago by Anima Mundi2.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1838 users visited in the last hour