How to run BWA or the other aligner for paired .fastq in a bash loop and pipeline?
4
0
Entering edit mode
5.4 years ago
shark29 • 0

How to run BWA or the other aligner for paired .fastq in a bash loop and pipeline?

files look like SRR1174825_R1.fastq and SRR1174825_R2.fastq How to rename them just to R1 and R2? thank you

bwa wgs bash shell • 7.1k views
0
Entering edit mode

What have you tried ?

0
Entering edit mode

files look like SRR1174825_R1.fastq and SRR1174825_R2.fastq How to rename them just to R1 and R2? thank you

1
Entering edit mode

That would not be wise. R1/R2 is too generic. You would trip over this kind of file names once you start going beyond two files.

6
Entering edit mode
5.4 years ago

For some files:

a_1.fq.gz  a_2.fq.gz  b_1.fq.gz  b_2.fq.gz


Use glob for PE read1, and replace the suffix to get the sample name.

for f in  *_1.fq.gz; do
r1=$f; r2=${f/_1.fq.gz/}._2.fq.gz;

echo $r1$r2; # do somethings
done


Result:

a_1.fq.gz a._2.fq.gz
b_1.fq.gz b._2.fq.gz

3
Entering edit mode

Given your example, I find the following more intuitive and verbose:

for f in ls *.fq.gz | cut -f1 -d'_' #get constant part
do
stuff ${f}_1.fq.gz${f}_2.fq.gz > ${f}.bam #whatever done  Also easy setting of output name ADD REPLY 0 Entering edit mode that helped me. but i wanted to ask what exactly cut -f1 -d'_' means. kindly explain -f1 and -d parameters. ADD REPLY 3 Entering edit mode Don't be so lazy. All of that is easy to find if you read the documentation of cut. Alternatively you can look at this website to explain shell code. ADD REPLY 1 Entering edit mode $ man cut

0
Entering edit mode

files look like SRR1174825_R1.fastq and SRR1174825_R2.fastq How to rename them just to R1 and R2? thank you

0
Entering edit mode

You don't need to. ${f} will expand to SRR1174825. ADD REPLY 4 Entering edit mode 5.4 years ago Depending on PE or SE reads, the loop is a bit more difficult. For SE reads, assuming the following filenames: sample1.fq.gz sample2.fq.gz sample3.fq.gz  You could use the following loop: for fastq in *.fq.gz do bwa mem ref.fasta$fastq > ${fastq%.*}.bam stuff etc etc #your command for bwa done  Or using gnu-parallel which is awesome: (directly piped into samtools for conversion to bam and sorting) ls *.fq.gz | parallel -j 2 'bwa mem ref.fasta {} | samtools sort -o {.}.bam'  ADD COMMENT 0 Entering edit mode Just jumping in here, I was given a for loop for using BWA very similar to what you have written above, but it doesn't work. I am not great with code, but if anyone on this thread has tips, I would love to hear them. When I run this loop I get a bunch of empty xxx.fastq.sam files and not the transition from xxx.fastq to xxx.sam as desired. for fname in ./*.fastq; do bwa mem MPB_male$fname.fastq > $fname.sam; done  ADD REPLY 0 Entering edit mode Are you sure? have you inspected the files or are you just going off the extensions? If its the latter, you should know that file extensions are just convention and don't actually have any meaning, it will simply be appending .sam to all of the filenames listed in your *.fastq glob If you want the extension to just be .sam you can change the command to: bwa mem ...... > "${fname%.*}".sam

0
Entering edit mode

Yes, using head xxx.sam shows nothing and the files show up as 0kb. Your patch does give me .sam files as desired, but they still have nothing in them. I can confirm that using bwa mem MPB_male xxx.fastq > xxx.sam gives me a working sam file. I just don't want to individually do this for 1000 samples!

0
Entering edit mode

Add echo in front of bwa in your loop (add quotes) and see if the printed commands look alright

1
Entering edit mode
5.4 years ago

It sounds like your main question is actually how to rename files...?

You do that like this:

mv SRR1174825_R1.fastq R1.fq
mv SRR1174825_R2.fastq R2.fq


ln -s SRR1174825_R1.fastq R1.fq
ln -s SRR1174825_R2.fastq R2.fq
`

The second version will not alter the original files, so I find it preferable.

This is not really a bioinformatics-related question, so it's better addressed on a site like stackoverflow. Also, I suggest you read about Linux and bash to get a basic understanding of how the command line works before trying to use bioinformatics tools. Once you have a basic understand of bash, you will realize that renaming is neither necessary nor productive in this case (in other words, that question is an X/Y problem).

0
Entering edit mode

need to rename all coming .fastq files starting as SRR***_R1 and SRR***_R2 searching a replacement regular expression

1
Entering edit mode

why do you need to rename them?

0
Entering edit mode

rename the files as what?

0
Entering edit mode

Please rewrite your post with correct syntax and grammar. Also, if you want it to be answered, you need to be more explicit.

0
Entering edit mode
5.4 years ago
shark29 • 0

found solution for renaming http://stackoverflow.com/questions/7793950/regex-to-remove-all-text-before-a-character then put R1 and R2 as arguments

0
Entering edit mode

That solves the question in the body of your original post.

If the question in the title has been answered by one of the answers then you should accept them to provide closure to this thread.