Different BAM files generated using different commands in bash - which one is right?
2
0
Entering edit mode
7.1 years ago
lamteva.vera ▴ 220

Dear friends,

please help me to understand what I am doing wrong.

I'm doing my best to figure out how to use Linux for the purposes of NGS analysis. Now I am trying to use a for loop to iterate over fastq files named like this: LK_S2_L001_R1_001.clean.fq (constant part is bold). To test it, I just specified the directory containing only one pair of files corresponding to one sample.

for f in `sudo sh -c "ls /path/to/*fq" | rev | cut -c 22- | rev | sort -u` 
do
bwa mem -t 2 ucsc_hg19.fasta -R "@RG\tID:${f}\tSM:${f}\tPL:ILLUMINA\tLB:${f}" ${f}_L001_R1_001.clean.fq ${f}_L001_R2_001.clean.fq | samtools sort -@ 2 -O BAM -o ${f}.bam
done

It generated a 188 668 kb BAM file, while the BAM for the same sample generated using

bwa mem -t 2 ucsc_hg19.fasta -R "@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tLB:sample" /path/to/sample_forward.fq /path/to/sample_reverse.fq | samtools sort -@ 2 -O BAM -o sample.bam

is 188 531 kb in size.

Here is a screenshot from IGV: enter image description here They are different! Any ideas why is it so?

BAM bash • 2.1k views
ADD COMMENT
1
Entering edit mode

I don't think there is a good reason to use sudo here, avoid it as much as possible.

ADD REPLY
0
Entering edit mode

Thank you for your comment. I tried without sudo:

for f in `ls /media/sf_Desktop/170303_M04607_0037_000000000-AR7YC/*fq | rev | cut -c 22- | rev | sort -u` 
do
bwa mem -t 2 ucsc_hg19.fasta -R "@RG\tID:${f}\tSM:${f}\tPL:ILLUMINA\tLB:${f}" ${f}_L001_R1_001.clean.fq ${f}_L001_R2_001.clean.fq | samtools sort -@ 2 -O BAM -o ${f}.bam
done

It generated BAM of the same size as it was without sudo: 188 668 kb.

ADD REPLY
1
Entering edit mode

So you don't need sudo, and you should only use sudo if you absolutely have to (get prompted to do so) and are absolutely sure about what you are doing.

ADD REPLY
0
Entering edit mode

But, you know, they still look different in IGV: enter image description here

I need to use a more accurate way to compare them. Right now trying to find out the way...

ADD REPLY
1
Entering edit mode

But, you know, they still look different in IGV

Did you not read @Heng's answer I had linked below?

Even if they "look" different what are you trying to do with the data? Call SNP's? The end result of that analysis should not be different, even if the IGV view looks different. If you need the files to look identical then you need to use an aligner that supports deterministic mode. bowtie2 is another aligner that I know which supports this mode.

ADD REPLY
0
Entering edit mode

Yes, I've read this, thank you. I got it. I'm going to call variants - OK, let BAMs look different in IGV if variant calls are going to be the same... I just want to know if I can use the aformentioned for loop to process FASTQ files in pairs, and different sized BAMs make me worry.

ADD REPLY
3
Entering edit mode
7.1 years ago
GenoMax 146k

Likely because aligners are non-deterministic. See Heng Li's answer here: BWA mem output inconsistent on same but re-ordered FASTQ input

As an aside BBMap can be run in deterministic mode with deterministic=t option.

ADD COMMENT
0
Entering edit mode
7.1 years ago
aditi.qamra ▴ 270

sample_forward.fq and sample_reverse.fq are hard coded. Are they the same files as ${f}_L001_R2_001.clean.fq ? Write out both the commands on a text file and then check if your variables ($f) are being encoded correctly and finally you have the same script. p.s. why do you need sudo sh -c here ?

ADD COMMENT

Login before adding your answer.

Traffic: 1111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6