Question: Different BAM files generated using different commands in bash - which one is right?
0
gravatar for lamteva.vera
10 days ago by
lamteva.vera10
Ukraine, Kyiv
lamteva.vera10 wrote:

Dear friends,

please help me to understand what I am doing wrong.

I'm doing my best to figure out how to use Linux for the purposes of NGS analysis. Now I am trying to use a for loop to iterate over fastq files named like this: LK_S2_L001_R1_001.clean.fq (constant part is bold). To test it, I just specified the directory containing only one pair of files corresponding to one sample.

for f in `sudo sh -c "ls /path/to/*fq" | rev | cut -c 22- | rev | sort -u` 
do
bwa mem -t 2 ucsc_hg19.fasta -R "@RG\tID:${f}\tSM:${f}\tPL:ILLUMINA\tLB:${f}" ${f}_L001_R1_001.clean.fq ${f}_L001_R2_001.clean.fq | samtools sort -@ 2 -O BAM -o ${f}.bam
done

It generated a 188 668 kb BAM file, while the BAM for the same sample generated using

bwa mem -t 2 ucsc_hg19.fasta -R "@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tLB:sample" /path/to/sample_forward.fq /path/to/sample_reverse.fq | samtools sort -@ 2 -O BAM -o sample.bam

is 188 531 kb in size.

Here is a screenshot from IGV: enter image description here They are different! Any ideas why is it so?

bash bam • 134 views
ADD COMMENTlink modified 10 days ago • written 10 days ago by lamteva.vera10
1

I don't think there is a good reason to use sudo here, avoid it as much as possible.

ADD REPLYlink written 10 days ago by WouterDeCoster22k

Thank you for your comment. I tried without sudo:

for f in `ls /media/sf_Desktop/170303_M04607_0037_000000000-AR7YC/*fq | rev | cut -c 22- | rev | sort -u` 
do
bwa mem -t 2 ucsc_hg19.fasta -R "@RG\tID:${f}\tSM:${f}\tPL:ILLUMINA\tLB:${f}" ${f}_L001_R1_001.clean.fq ${f}_L001_R2_001.clean.fq | samtools sort -@ 2 -O BAM -o ${f}.bam
done

It generated BAM of the same size as it was without sudo: 188 668 kb.

ADD REPLYlink written 10 days ago by lamteva.vera10
1

So you don't need sudo, and you should only use sudo if you absolutely have to (get prompted to do so) and are absolutely sure about what you are doing.

ADD REPLYlink written 10 days ago by WouterDeCoster22k

But, you know, they still look different in IGV: enter image description here

I need to use a more accurate way to compare them. Right now trying to find out the way...

ADD REPLYlink written 10 days ago by lamteva.vera10
1

But, you know, they still look different in IGV

Did you not read @Heng's answer I had linked below?

Even if they "look" different what are you trying to do with the data? Call SNP's? The end result of that analysis should not be different, even if the IGV view looks different. If you need the files to look identical then you need to use an aligner that supports deterministic mode. bowtie2 is another aligner that I know which supports this mode.

ADD REPLYlink modified 10 days ago • written 10 days ago by genomax34k

Yes, I've read this, thank you. I got it. I'm going to call variants - OK, let BAMs look different in IGV if variant calls are going to be the same... I just want to know if I can use the aformentioned for loop to process FASTQ files in pairs, and different sized BAMs make me worry.

ADD REPLYlink written 10 days ago by lamteva.vera10
3
gravatar for genomax
10 days ago by
genomax34k
United States
genomax34k wrote:

Likely because aligners are non-deterministic. See Heng Li's answer here: BWA mem output inconsistent on same but re-ordered FASTQ input

As an aside BBMap can be run in deterministic mode with deterministic=t option.

ADD COMMENTlink written 10 days ago by genomax34k
0
gravatar for aditi.qamra
10 days ago by
aditi.qamra230
Singapore
aditi.qamra230 wrote:

sample_forward.fq and sample_reverse.fq are hard coded. Are they the same files as ${f}_L001_R2_001.clean.fq ? Write out both the commands on a text file and then check if your variables ($f) are being encoded correctly and finally you have the same script. p.s. why do you need sudo sh -c here ?

ADD COMMENTlink written 10 days ago by aditi.qamra230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 962 users visited in the last hour