BWA (mem, sampe, or aln) used in bam file
3
0
Entering edit mode
5.9 years ago
rborges ▴ 50

I'm trying to figure out which version/type of bwa was used in three aligned bam files (sampe, mem or aln). Is there a way of figuring this out? I can't find information on this from where I downloaded the samples.

Looking at the header, they say the following:

samtools view -H file1.bam  | grep "bwa"
@PG ID:bwa  PN:bwa  VN:0.5.9-r16

samtools view -H    file2.bam  | grep "bwa"
@PG ID:bwa  PN:bwa  VN:0.6.1-r104-tpx

samtools view -H   file3.bam | grep "bwa"
#(Has no output).

But from what I understand this just says the version of bwa and not the type of bwa used.

Is there something about the format in the output which might say which version was used?

EDIT: I've also grepped for "CL" and "PG", but the information is not in the header for these files.

Thank you

bwa bam sequencing • 2.0k views
ADD COMMENT
2
Entering edit mode

EDIT: I've also grepped for "CL" and "PG", but the information is not in the header for these files.

Then you may be out of luck on last file. It looks like you have reasonably older versions of bwa in first two examples and at that time it may not have been capturing the command line used.

You could always recreate the fastq file and realign the data with current bwa.

ADD REPLY
1
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

ADD REPLY
2
Entering edit mode
5.9 years ago
Benn 8.3k

It should be in the same line following CL:, right after the VN: (which is the version of bwa). The info is thus not in your files.

Here an example of a file I used:

samtools view -H SRR1291026.bwa.bam  | grep "bwa"

@PG     ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:bwa mem -t 20 /home/Reference/hg38/hg38.fa SRR1291026_1.fastq.gz SRR1291026_2.fastq.gz
ADD COMMENT
2
Entering edit mode
5.9 years ago

Try samtools view -H SRR1172709.sam|grep "CL:"

$ samtools view -H SRR1172709.sam|grep "CL:"
@PG ID:bwa  PN:bwa  VN:0.7.16a-r1181    CL:bwa mem -t 10 -R @RG\tID:FLOWCELL1.LANE1\tPL:ILLUMINA\tLB:SINDIA\tSM:SRR1172709 ../../MTB_DATA//Ref_H37rv/h37rv.fa ../SRR1172709_1.fastq ../SRR1172709_2.fastq

If CL: filed is missing check this thread for related suggestions Predict/Estimate/Find Bwa Parameters From Bam Or Sam File

ADD COMMENT
1
Entering edit mode
5.9 years ago
h.mon 35k

You are looking for the CL field of a @PG header, try grepping for '@PG' - there should be just one or a few lines.

ADD COMMENT

Login before adding your answer.

Traffic: 2515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6