How to rename multiple fastq files
7
0
Entering edit mode
23 months ago
ENK • 0

Hellooo geeks.. I want to rename multiple files:

original names are as follows;

GEN191010_N_NBS0_lib94256_1700_1_R1.fastq
GEN191010_N_NBS0_lib94256_1700_1_R2.fastq
GEN191010_N_NBXBS10_lib94257_1700_1_R1.fastq
GEN191010_N_NBXBS10_lib94257_1700_1_R2.fastq

However, I want the final names like this:

NBS0_1_R1.fastq
NBS0_1_R2.fastq
NBXBS10_1_R1.fastq
NBXBS10_1_R2.fastq

Your help will be very much appreciated.

next-gen • 1.8k views
ADD COMMENT
2
Entering edit mode

helloooooooo newbie. What have you tried ?

ADD REPLY
0
Entering edit mode

We need to know more about your files. Are the strings you want removed the same in all cases as this suggests?

If not, do they follow a regular pattern?

ADD REPLY
0
Entering edit mode

Thank you all for your suggestions. I used that of Mensur and it work just fine. Much appreciated.

ADD REPLY
5
Entering edit mode
23 months ago

Lots of solution from others work, but I'd like to recommend a safer solution of mine (brename), in case you overwrite files with others by accident, which is common in batch renaming files using regular expression.

brename checks all operations before execution for safety.

$ brename --include-filters  '.fastq$' --ignore-ext  \
 -p 'GEN191010_N_(.+?_).+(R[12])' -r '$1$2' --dry-run
[INFO] main options:
[INFO]   ignore case: false
[INFO]   search pattern: GEN191010_N_(.+?_).+(R[12])
[INFO]   include filters: .fastq$
[INFO]   search paths: ./
[INFO] 
[INFO] checking: [ ok ] 'GEN191010_N_NBS0_lib94256_1700_1_R1.fastq' -> 'NBS0_R1.fastq'
[INFO] checking: [ ok ] 'GEN191010_N_NBS0_lib94256_1700_1_R2.fastq' -> 'NBS0_R2.fastq'
[INFO] checking: [ ok ] 'GEN191010_N_NBXBS10_lib94257_1700_1_R1.fastq' -> 'NBXBS10_R1.fastq'
[INFO] checking: [ ok ] 'GEN191010_N_NBXBS10_lib94257_1700_1_R2.fastq' -> 'NBXBS10_R2.fastq'
[INFO] 4 path(s) to be renamed
ADD COMMENT
4
Entering edit mode
23 months ago
ATpoint 57k

Using the field splitting function of awk. This assumes that formatting is the same for all files. The advantage is that you do not need any regex that alters the fields directly (like deleting numbers or characters) but you simply split them by their common delimiter _ and then select those you want to build the final file name.

for i in *.fastq
  do
  mv $i $(echo $i | awk '{split($1,a,/_/); print a[3]"_"a[5]"_"a[6]"_"a[7]}')
  done
ADD COMMENT
4
Entering edit mode
23 months ago
Mensur Dlakic ★ 15k

Below is a shell script that replaces defined strings inside a group of files with the same extension. It is probably an overkill in your case since you can simply enter 4 mv commands instead of 2 needed with this script. First save the script as fix-name.com and make it executable (chmod +x fix-name.com). You also need to have a (t)csh installed, which I guess is not a given these days. I am sure someone will come up with a better bash script in no time.

In your case, enter:

fix-name.com fastq GEN191010_N_ ""
fix-name.com fastq _lib94257_1700 ""

The script:

#!/bin/tcsh
if ( "$1" == "" ) then
    echo ""
    echo " This script renames all files with a given extension by"
    echo " replacing part of their names with user specified strings."
    echo ""
    echo " The correct syntax is:"
    echo ""
    echo " fix-name.com <file extension> <replace what> <replace with>"
    echo ""
    echo " For example, to rename all *junk.txt files so that junk"
    echo " is removed from their names, use this command:"
    echo " "
    echo " fix-name.com txt junk ''"
    echo " "
    echo " First argument (file extension without .) has to be entered."
    echo " The defaults are junk and an empty string, which means"
    echo " removing junk from file names."
    echo ""
    exit 9
endif

if ( "$2" == "" ) then
    setenv STR1 "junk"
    else
    setenv STR1 $2
    endif

if ( "$3" == "" ) then
    setenv STR2 ""
    else
    setenv STR2 $3
    endif

find . -maxdepth 1 -name "*.$1" -print | agrep "$STR1" | sort > tmp-list1
cp tmp-list1 tmp-list2
perl -pi -e 's/\.\///g' tmp-list2
perl -pi -e 's/$ENV{"STR1"}/$ENV{"STR2"}/g' tmp-list2
perl -pi -e 's/\.\//mv /g' tmp-list1
paste -d" " tmp-list1 tmp-list2 > tmp-list
source tmp-list >& /dev/null
rm tmp-list tmp-list1 tmp-list2
ADD COMMENT
2
Entering edit mode
23 months ago
Joe 19k

Making assumptions about the consistency of your files:

for file in /path/to/*.fastq ; do 
    mv $file $(echo $file | sed -e 's/GEN191010_N_//gi'  -e 's/lib[0-9]\{5\}_[0-9]\{4\}_//gi')
done

NB: untested code.

ADD COMMENT
2
Entering edit mode
23 months ago
Joe 19k

An alternative (and the best) approach, using Unix's rename:

rename -nv 's/GEN191010_N_(.*)_lib[0-9]{5}_[0-9]{4}_/$1/gi' *.fastq

Drop the -n if you're happy with the substitutions, and it will actually perform the replacement.

ADD COMMENT
0
Entering edit mode
23 months ago
Malcolm.Cook ★ 1.3k

GNU Parallel lets you harness the power of perl regular expressions:

parallel --dry-run mv {=Q($_)=} {=Q(s/GEN\d+_N_(\w+)_lib\d+_\d+_(\d+_R\d)/$1_$2/)=} ::: *.fastq

note: run it once with the --dry-run to make sure it does what you want, then run again without to do the deed.

ADD COMMENT
0
Entering edit mode
parallel --dry-run mv {} '{=s/GEN\d+_N_(\w+)_lib\d+_\d+_(\d+_R\d)/$1_$2/=}' ::: *.fastq
ADD REPLY
1
Entering edit mode

Hi @Ole - I thought the Q() would allow the recipe to work even if filename had whitespace in them but I guess that was unneeded over-protection... yes?

ADD REPLY

Login before adding your answer.

Traffic: 2660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6