Question: How can I add Sample Identifier to paired fastq file names
0
gravatar for Tawny
5 weeks ago by
Tawny130
United States
Tawny130 wrote:

I have over 500 paired fastq files. They have been received from a source where the Sample Identifier (S1, S2, S3) is no longer in the file names. I need to add a Sample Identifier to my paired file names for processing using QIIME2.

Here are some example file names:

Tube211-16S_L001_R1_001.fastq
Tube211-16S_L001_R2_001.fastq
Tube212-16S_L001_R1_001.fastq
Tube212-16S_L001_R2_001.fastq
Tube213-16S_L001_R1_001.fastq
Tube213-16S_L001_R2_001.fastq

I would like to add sequential Sample Identifiers to these so that they would look like this when finished:

Tube211-16S_S1_L001_R1_001.fastq
Tube211-16S_S1_L001_R2_001.fastq
Tube212-16S_S2_L001_R1_001.fastq
Tube212-16S_S2_L001_R2_001.fastq
Tube213-16S_S3_L001_R1_001.fastq
Tube213-16S_S3_L001_R2_001.fastq

I have tried to get this working however what it does is just add S1 to all of the R1 file names:

for((k=1;k<=516;k++)); do for i in *.fastq; do mv "$i" "`echo $i | sed "s/_16S_L001_R1/-16S_S${k}_L001_R1/"`"; done; done

I need to add the same Sample Identifier to the R1 and R2 paired file names.

How can this be done?

fastq • 174 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Tawny130
$ ls
Tube211-16S_L001_R1_001.fastq  Tube212-16S_L001_R1_001.fastq  Tube213-16S_L001_R1_001.fastq
Tube211-16S_L001_R2_001.fastq  Tube212-16S_L001_R2_001.fastq  Tube213-16S_L001_R2_001.fastq

.

$ ls *R2*.fastq | sort|  nl -nln | sed 's/^/S/;s/\s\+/_/' | rename -n 's/(.*)_(.*)-(.*)_(.*)/$2_$1_$3_$4/'

rename(S1_Tube211-16S_L001_R2_001.fastq, Tube211_S1_16S_L001_R2_001.fastq)
rename(S2_Tube212-16S_L001_R2_001.fastq, Tube212_S2_16S_L001_R2_001.fastq)
rename(S3_Tube213-16S_L001_R2_001.fastq, Tube213_S3_16S_L001_R2_001.fastq)
ADD REPLYlink written 4 weeks ago by cpad011212k
1
gravatar for Tawny
5 weeks ago by
Tawny130
United States
Tawny130 wrote:

I ended up making a slight change to colin.kern's answer. It needed arithmetic expansion to properly increment the variable k. Here is the command that ended up working for me:

k=1; for i in *.fastq; do mv "$i" "`echo $i | sed "s/_16S_L001_R1/-16S_S${k}_L001_R1/"`"; k=$((k+1)); done
ADD COMMENTlink written 5 weeks ago by Tawny130
1
gravatar for colin.kern
5 weeks ago by
colin.kern510
United States
colin.kern510 wrote:

It doesn't work because the inner loop (going through all the fastq files) is done completely on the first iteration of the outer loop (when k=1). You can just increment k yourself in the loop:

k=1; for i in *.fastq; do mv "$i" "`echo $i | sed "s/_16S_L001_R1/-16S_S${k}_L001_R1/"`"; k=$k+1; done

Also, you can actually do search and replace on variables directly in bash:

k=1; for i in *.fastq; do mv "$i" "${i/_16S_L001_R1/-16S_S${k}_L001_R1}"; k=$k+1; done
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by colin.kern510

@colin.kern thank you for offering these two solutions. I did have to modify them slightly (see my answer below) by using arithmetic expansion.

ADD REPLYlink written 5 weeks ago by Tawny130

Tawny : It is fair that you acknowledge their help by at least upvoting this answer.

ADD REPLYlink written 5 weeks ago by genomax73k
1
gravatar for shenwei356
4 weeks ago by
shenwei3564.8k
China
shenwei3564.8k wrote:

try brename

# read1
$ brename -f 'R1.+fastq$' -p _L -r '_S{nr}_L' -d 
[INFO] checking: [ ok ] 'Tube211-16S_L001_R1_001.fastq' -> 'Tube211-16S_S1_L001_R1_001.fastq'
[INFO] checking: [ ok ] 'Tube212-16S_L001_R1_001.fastq' -> 'Tube212-16S_S2_L001_R1_001.fastq'
[INFO] checking: [ ok ] 'Tube213-16S_L001_R1_001.fastq' -> 'Tube213-16S_S3_L001_R1_001.fastq'
[INFO] 3 path(s) to be renamed

# read2
$ brename -f 'R2.+fastq$' -p _L -r '_S{nr}_L' -d 
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by shenwei3564.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 3572 users visited in the last hour