Question: Malformed BAM file produced by BWA
0
gravatar for joreamayarom
2.2 years ago by
joreamayarom110
USA/Cambridge
joreamayarom110 wrote:

I'm aligning reads to a genome to later do some variant calling. I have mapped some reads to a genome using bwa and created the corresponding dictionary

bwa index -p ref.fa ref.fa
bwa mem ref.fa forward.fastq.gz reverse.fastq.gz -t 16 > mapped_file.sam
samtools view -bS mapped_file.sam -o mapped_file.bam
java -jar picard.jar CreateSequenceDictionary \
R=ref.fa \
O=ref.dict

samtools faidx ref.fa

And then proceeded to sort and remove duplicates

java -jar picard.jar SortSam \
I=mapped_file.bam \
O=sorted_file.bam \
SORT_ORDER=coordinate

java -jar picard.jar MarkDuplicates \
REMOVE_DUPLICATES=true \
ASSUME_SORT_ORDER=coordinate \
I=sorted_file.bam \
O=clean_file.bam \
M=metric_file.txt

And finally tried to realign the sequences using GATK, but the programmed complained that my input bam file is malformed.

java -jar GenomeAnalysisTK.jar \
    -T RealignerTargetCreator \
    -R ref.fa \
    -I clean_file.bam \
    -o clean_file.intervals

I ran ValidateSamFile, as GATK recommends, and got this error

## HISTOGRAM    java.lang.String 
Error Type  Count 
ERROR:MISSING_READ_GROUP    1
WARNING:RECORD_MISSING_READ_GROUP   324404

To solve this problem I tried running AddOrReplaceReadGroups and FixMateInformation on my clean_file.bam, as the GATK webpage recommends (http://gatkforums.broadinstitute.org/gatk/discussion/7571/errors-in-sam-bam-files-can-be-diagnosed-with-validatesamfile), to no avail. I also ran ValidateSamFile directly in my Bra output and got a similar error message. Is there an explanation for this problem? Has my BWA output been corrupted in any stage?

samtools picard gatk • 1.2k views
ADD COMMENTlink modified 2.2 years ago by Pierre Lindenbaum121k • written 2.2 years ago by joreamayarom110

The RG tag has to be added manually, so it's not really a "corrupt" BAM... you just haven't added RG metadata and GATK needs that. AddOrReplaceReadGroups is the answer. Why didn't it, er, "avail"? :)

ADD REPLYlink written 2.2 years ago by John12k
2
gravatar for Pierre Lindenbaum
2.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

1) one step (don't save as SAM but bam, add your read groups and sort)

bwa mem -R '@RG\tID:S1\tSM:SAMPLE1'  ref.fa forward.fastq.gz reverse.fastq.gz -t 16 |  samtools sort -O BAM -o out.bam -T tmp -

2) when using the picard tools use the option

VALIDATION_STRINGENCY=LENIENT
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Pierre Lindenbaum121k
1

I'm checking GATK documentation right now. It is mentioned that the -M flag is essential for Picard compatibility. Is that missing from your answer or you have an specific reason to leave it out?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by joreamayarom110

GATK/-M option ? what are you talking about ? VALIDATION_STRINGENCY is a general option for picard when handling bam.

ADD REPLYlink written 2.2 years ago by Pierre Lindenbaum121k

I'm talking about recommendation 1. In GATK webpage (http://gatkforums.broadinstitute.org/gatk/discussion/2799) they recommend the following command with an -M before -R.

bwa mem -M -R ’<read group info>’ -p reference.fa raw_reads.fq > aligned_reads.sam

ADD REPLYlink written 2.2 years ago by joreamayarom110

1) the option -M has nothing to do with your problem. Check the documentation

2) the order of the options doesn't matter.

ADD REPLYlink written 2.2 years ago by Pierre Lindenbaum121k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1445 users visited in the last hour