Question: BWA mem on multiple samples
1
gravatar for ruhirai
4.6 years ago by
ruhirai20
United States
ruhirai20 wrote:

I have multiple samples with R1 and R2 reads in fastq.gz format (these files are complementary to each other) I want to run BWA mem paired end parallel on all the files once finished each R1 and R2 complementary file should produce one sam file. Right now I am making two sam file from the two reads

This is what I have come up with but it’s not doing what I need it to do

for i in find -maxdepth 2 -iname *fastq.gz -type f; do echo "bwa mem -t 12 /H.Sapiens/ucsc.hg19.fasta ${i}_R1_001.fastq.gz ${i}_R2_001.fastq.gz > ${i}_R1_R2.sam"; done

when it runs it looks like this

bwa mem -t 12 /H.Sapiens/ucsc.hg19.fasta ./Sample_0747/0747_CGG_L001_R2_001.fastq.gz_R1_001.fastq.gz ./Sample_0747/0747_CGG_L001_R2_001.fastq.gz_R2_001.fastq.gz > ./Sample_0747/0747_CGG_L001_R2_001.fastq.gz_R1_R2.sam

bwa mem -t 12 H.Sapiens/ucsc.hg19.fasta ./Sample_0748/0748_CCA_L001_R1_001.fastq.gz_R1_001.fastq.gz ./Sample_0748/0748_CCA_L001_R1_001.fastq.gz_R2_001.fastq.gz > ./Sample_0748/0748_CCA_L001_R1_001.fastq.gz_R1_R2.sam -bash-4.1$ I understand the problem is in iname but how do I fixit? Thank you so much

alignment • 10k views
ADD COMMENTlink modified 4.6 years ago by Pierre Lindenbaum123k • written 4.6 years ago by ruhirai20
6
gravatar for Pierre Lindenbaum
4.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

I would still process each fastq

 

$ find test -name "*.fastq.gz"
test/fastq/sample_1_02_R1.fastq.gz
test/fastq/sample_2_01_R2.fastq.gz
test/fastq/sample_1_01_R2.fastq.gz
test/fastq/sample_1_02_R2.fastq.gz
test/fastq/sample_2_01_R1.fastq.gz
test/fastq/sample_1_01_R1.fastq.gz

 

process each pair in paralllel using a Makefile (option -j) or GNU parallel:

$ find ./  -name "*.fastq.gz" | grep -v _R2.fastq.gz | sed 's/_R1.fastq.gz//' | parallel bwa mem test/ref/ref.fa {}_R1.fastq.gz {}_R2.fastq.gz '>' '{}'.sam

 

$ find test -name "*.sam"
test/fastq/sample_1_01.sam
test/fastq/sample_1_02.sam
test/fastq/sample_2_01.sam

and then merge all sam using picard

$ java -jarpicard.jar MergeSamFiles I=test/fastq/sample_1_01.sam I=test/fastq/sample_1_02.sam SO=coordinate O=test/fastq/sample_1.bam AS=false (...) INFO 2015-03-13 23:52:31 MergeSamFiles Sorting input files using temp directory [/tmp/lindenb] INFO 2015-03-13 23:52:32 MergeSamFiles Finished reading inputs. [Fri Mar 13 23:52:32 CET 2015] picard.sam.MergeSamFiles done. Elapsed time: 0.01 minutes.

 

 

ADD COMMENTlink written 4.6 years ago by Pierre Lindenbaum123k

Hey Dr. Lindenbaum,

I have the same issue as the original question. I tried your pipeline, and I now have my sam files for my samples. When I try to merge the sam files together using picard, I get an error " Cannot add sequence that already exists in SAMSequenceDictionary: ", The sam files I am trying to merge are 4 lanes of the same sample, and each lane is paired end. For example the following lanes have a R1 and R2 read.

Sample 1 - Lane_01
Sample 1 - Lane_02
Sample 1 - Lane_03
Sample 1 - Lane_04

I ran BWA mem as you suggested, and I got Lane_01 - Lane_04 sam files. Since they are of the same initial sample, they should contain the same/very similar sequences. Picard won't allow me to merge these files because they share the same sequences. How do you recommend I advance from this issue?

ADD REPLYlink written 4.3 years ago by satshil.r50

I have 145 SAM files. How could I avoid to type the SAM file names as input to MergeSamFiles? 

ADD REPLYlink written 4.1 years ago by Ric280
1

Google "file globbing".

ADD REPLYlink written 4.1 years ago by Devon Ryan92k

and picard MergeSamFiles can read a list of files

I=my.list.of.145.files.txt

 

ADD REPLYlink written 4.1 years ago by Pierre Lindenbaum123k
2
gravatar for Devon Ryan
4.6 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:
for i in find -maxdepth 2 -iname *_R1_001.fastq.gz
do
    i=${i%%_R1_001.fastq.gz}
    ...
done

or something along those lines. Given that there are probably 001.fastq.gz, 002.fastq.gz, etc. files, it's probably best to simply process all of $i in a single go (e.g., with process substitution in bash and <(cat $i_R1_???.fastq.gz)).

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Devon Ryan92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 773 users visited in the last hour