remove part of name of multiple files in linux
3
0
Entering edit mode
7.3 years ago

I have several fastq.gz files in a directory. I want to delete parts of each file name. Here are the file names

RES_1448_001_S289_R1_001.fastq.gz
RES_1448_001_S289_R2_001.fastq.gz
RES_1448_012_S300_R1_001.fastq.gz
RES_1448_012_S300_R2_001.fastq.gz

I want to remove S and 3 digits after it. I expect this after removing

RES_1448_001_R1_001.fastq.gz
RES_1448_001_R2_001.fastq.gz
RES_1448_012_R1_001.fastq.gz
RES_1448_012_R2_001.fastq.gz

Thank you.

next-gen • 9.3k views
ADD COMMENT
6
Entering edit mode
7.3 years ago
Steven Lakin ★ 1.8k

This is how I usually do these:

for f in ./*.fastq.gz; do newname=$( echo $f | sed -r 's/_S[0-9]{3}_/_/' ); mv $f $newname; done

In general, regex is a good "language" to learn:

_S[0-9]{3}_

finds the sequence of an underscore, S, then 3 digits 0-9, then an underscore. Sed is a command line find and replace tool that operates line by line. Note that for this to work, there shouldn't be another occurrence of that pattern except in the files you wish to modify.

If anyone has a better way, I'm all ears.

ADD COMMENT
2
Entering edit mode
7.3 years ago
Charles Plessy ★ 2.9k
rename 's/S..._//' RES_1448_0*

Available from https://metacpan.org/release/File-Rename, or on Debian systems and derivatives from the rename package.

ADD COMMENT
2
Entering edit mode
7.3 years ago

Here's another option and how I would do it:

for x in RES_1448_001_S289_R1_001.fastq.gz \
         RES_1448_001_S289_R2_001.fastq.gz \
         RES_1448_012_S300_R1_001.fastq.gz \
         RES_1448_012_S300_R2_001.fastq.gz
 do
    mv $x ${x/S[0-9][0-9][0-9]_/}
done

Which will execute:

mv RES_1448_001_S289_R1_001.fastq.gz RES_1448_001_R1_001.fastq.gz
mv RES_1448_001_S289_R2_001.fastq.gz RES_1448_001_R2_001.fastq.gz
mv RES_1448_012_S300_R1_001.fastq.gz RES_1448_012_R1_001.fastq.gz
mv RES_1448_012_S300_R2_001.fastq.gz RES_1448_012_R2_001.fastq.gz

Note that if use \ to split a command across multiple lines, like I did, there should be no space or anything after the \ character. Obviously you don't need to list the files one by one, you can use a glob like *.fastq.gz.

ADD COMMENT

Login before adding your answer.

Traffic: 2937 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6