Question: Bulk quality interleaving with bbmap reformat command
0
gravatar for Longshotx
9 months ago by
Longshotx 20
Longshotx 20 wrote:

I have several fastq.gz files that look like this:

Sample1_R1.fastq.gz Sample1_R2.fastq.gz

Sample2_R1.fastq.gz Sample2_R2.fastq.gz

etc...

The output needs to be: Sample1_interleaved.fastq.gz, Sample2_interleaved.fastq.gz

I am still learning how to loop commands so bare with me. This is what I tried to run:

for i in `ls -1 *_R#.fastq.gz | sed 's/_R#.fastq.gz//‘`
do
reformat.sh in=$i\_R#.fastq.gz out=$i\_interleaved.fastq.gz 
done

However this did not work. Can someone help me? Many Thanks!

bash bbmap bbduk loop reformat • 383 views
ADD COMMENTlink modified 7 months ago by ross_whetten10 • written 9 months ago by Longshotx 20
1
gravatar for ross_whetten
7 months ago by
ross_whetten10
ross_whetten10 wrote:

The # symbol in ls -1 *_R#.fastq.gz won't match 1 or 2 - the character you want is ? in that position to match any single character in the file names. Similarly, sed doesn't recognize # as a wildcard character either, but . will work as a wildcard, or you can specify a single character chosen from either 1 or 2 with [12]. An alternative to sed for removing the unwanted remainder of the filename is the basename function, which can also remove leading directory names as well. For example:

for i in /path/to/files/*_R?.fastq.gz; do name=$(basename $i _R[12].fastq.gz); 
reformat.sh in=${name}_R#.fastq.gz out=${name}_interleaved.fastq.gz; done

Note that there is a space between the $i and the _R[12].fastq.gz within the basename command.

ADD COMMENTlink written 7 months ago by ross_whetten10

In addition to using these instructions to fix problems with the loop infenit101 you should explicitly provide two inputs to the reformat.sh command to get the interleaving.

Edit: As @ross points out below using # shortcut should indeed work.

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax71k

@genomax - Based on the Reformat User Guide, I think the <name>_R#.fastq.gz syntax would work, although the escape before the underscore (<name>_R#) will cause problems that I didn't mention.

ADD REPLYlink written 7 months ago by ross_whetten10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 579 users visited in the last hour