Question: How to run BWA or the other aligner for paired .fastq in a bash loop and pipeline?
0
gravatar for shark29
2.9 years ago by
shark290
shark290 wrote:

How to run BWA or the other aligner for paired .fastq in a bash loop and pipeline?

files look like SRR1174825_R1.fastq and SRR1174825_R2.fastq How to rename them just to R1 and R2? thank you

bash shell bwa wgs • 3.7k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by shark290

What have you tried ?

ADD REPLYlink written 2.9 years ago by Pierre Lindenbaum124k

files look like SRR1174825_R1.fastq and SRR1174825_R2.fastq How to rename them just to R1 and R2? thank you

ADD REPLYlink written 2.9 years ago by shark290
1

That would not be wise. R1/R2 is too generic. You would trip over this kind of file names once you start going beyond two files.

ADD REPLYlink written 2.9 years ago by genomax74k
4
gravatar for WouterDeCoster
2.9 years ago by
Belgium
WouterDeCoster42k wrote:

Depending on PE or SE reads, the loop is a bit more difficult.

For SE reads, assuming the following filenames:

sample1.fq.gz
sample2.fq.gz
sample3.fq.gz

You could use the following loop:

for fastq in *.fq.gz
do
bwa mem ref.fasta $fastq > ${fastq%.*}.bam stuff etc etc #your command for bwa
done

Or using gnu-parallel which is awesome: (directly piped into samtools for conversion to bam and sorting)

ls *.fq.gz | parallel -j 2 'bwa mem ref.fasta {} | samtools sort -o {.}.bam'
ADD COMMENTlink modified 12 months ago • written 2.9 years ago by WouterDeCoster42k

Just jumping in here, I was given a for loop for using BWA very similar to what you have written above, but it doesn't work. I am not great with code, but if anyone on this thread has tips, I would love to hear them. When I run this loop I get a bunch of empty xxx.fastq.sam files and not the transition from xxx.fastq to xxx.sam as desired.

for fname in ./*.fastq;

do

bwa mem MPB_male $fname.fastq > $fname.sam;

done
ADD REPLYlink modified 11 months ago • written 11 months ago by BeetleCheers0

Are you sure? have you inspected the files or are you just going off the extensions?

If its the latter, you should know that file extensions are just convention and don't actually have any meaning, it will simply be appending .sam to all of the filenames listed in your *.fastq glob

If you want the extension to just be .sam you can change the command to:

bwa mem ......  > "${fname%.*}".sam
ADD REPLYlink modified 11 months ago • written 11 months ago by Joe15k

Yes, using head xxx.sam shows nothing and the files show up as 0kb. Your patch does give me .sam files as desired, but they still have nothing in them. I can confirm that using bwa mem MPB_male xxx.fastq > xxx.sam gives me a working sam file. I just don't want to individually do this for 1000 samples!

ADD REPLYlink written 11 months ago by BeetleCheers0

Add echo in front of bwa in your loop (add quotes) and see if the printed commands look alright

ADD REPLYlink written 11 months ago by WouterDeCoster42k
3
gravatar for shenwei356
2.9 years ago by
shenwei3564.9k
China
shenwei3564.9k wrote:

For some files:

a_1.fq.gz  a_2.fq.gz  b_1.fq.gz  b_2.fq.gz

Use glob for PE read1, and replace the suffix to get the sample name.

for f in  *_1.fq.gz; do
   r1=$f;
   r2=${f/_1.fq.gz/}._2.fq.gz;

   echo $r1 $r2; # do somethings
done

Result:

a_1.fq.gz a._2.fq.gz
b_1.fq.gz b._2.fq.gz
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by shenwei3564.9k
1

Given your example, I find the following more intuitive and verbose:

for f in `ls *.fq.gz | cut -f1 -d'_'` #get constant part
do
stuff ${f}_1.fq.gz ${f}_2.fq.gz > ${f}.bam #whatever
done

Also easy setting of output name

ADD REPLYlink written 2.9 years ago by WouterDeCoster42k

that helped me. but i wanted to ask what exactly cut -f1 -d'_'` means. kindly explain -f1 and -d parameters.

ADD REPLYlink written 12 months ago by hafiz.talhamalik210
3

Don't be so lazy. All of that is easy to find if you read the documentation of cut. Alternatively you can look at this website to explain shell code.

ADD REPLYlink modified 12 months ago • written 12 months ago by WouterDeCoster42k
1

$ man cut

ADD REPLYlink written 12 months ago by Joe15k

files look like SRR1174825_R1.fastq and SRR1174825_R2.fastq How to rename them just to R1 and R2? thank you

ADD REPLYlink written 2.9 years ago by shark290

You don't need to. ${f} will expand to SRR1174825.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by genomax74k
0
gravatar for Brian Bushnell
2.9 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

It sounds like your main question is actually how to rename files...?

You do that like this:

mv SRR1174825_R1.fastq R1.fq
mv SRR1174825_R2.fastq R2.fq

Alternately, via symlinks:

ln -s SRR1174825_R1.fastq R1.fq
ln -s SRR1174825_R2.fastq R2.fq

The second version will not alter the original files, so I find it preferable.

This is not really a bioinformatics-related question, so it's better addressed on a site like stackoverflow. Also, I suggest you read about Linux and bash to get a basic understanding of how the command line works before trying to use bioinformatics tools. Once you have a basic understand of bash, you will realize that renaming is neither necessary nor productive in this case (in other words, that question is an X/Y problem).

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Brian Bushnell16k

need to rename all coming .fastq files starting as SRR***_R1 and SRR***_R2 searching a replacement regular expression

ADD REPLYlink written 2.9 years ago by shark290
1

why do you need to rename them?

ADD REPLYlink written 2.9 years ago by Joe15k

rename the files as what?

ADD REPLYlink written 2.9 years ago by genomax74k

Please rewrite your post with correct syntax and grammar. Also, if you want it to be answered, you need to be more explicit.

ADD REPLYlink written 2.9 years ago by Brian Bushnell16k
0
gravatar for shark29
2.9 years ago by
shark290
shark290 wrote:

found solution for renaming http://stackoverflow.com/questions/7793950/regex-to-remove-all-text-before-a-character then put R1 and R2 as arguments

ADD COMMENTlink written 2.9 years ago by shark290

That solves the question in the body of your original post.

If the question in the title has been answered by one of the answers then you should accept them to provide closure to this thread.

ADD REPLYlink written 2.9 years ago by genomax74k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1743 users visited in the last hour