Rename filenames based on list
0
0
Entering edit mode
2.5 years ago

Hi everyone, I have a bunch of file entries named as below in file1.txt:

5275_AA_run719_GAGATTCC_S520_L004_R1_001.fastq.gz
5275_A_run720_ATTACTCG_S84_L001_R1_001.fastq.gz
5275_AB_run719_GAGATTCC_S521_L004_R1_001.fastq.gz
5275_B_run720_ATTACTCG_S85_L002_R1_001.fastq.gz

I would like to rename the first two columns (separated by _) of each filename, according to the file correspondence.txt:

5275_A  MDF3
5275_B  MDF6
5275_AA MCO6
5275_AB MCO7

If I run

while read n k; do sed -i "s/$n/$k/g" file1.txt ; done < correspondence.txt

this will rename files in a wrong way. For example, the

5275_AA_run719_GAGATTCC_S520_L004_R1_001.fastq.gz

file will be renamed to

MDF3A_run719_GAGATTCC_S520_L004_R1_001.fastq.gz

instead of

MCO6_run719_GAGATTCC_S520_L004_R1_001.fastq.gz

Is there a way to optimize the above code?

Thank you.

sequence • 600 views
ADD COMMENT
1
Entering edit mode

perhaps try this:

while read n k; do sed -i "s/${n}_/${k}_/g" file1.txt ; done < correspondence.txt

(expand the regex with a _ to make it more specific)

ADD REPLY
0
Entering edit mode

If that doesn't suffice, you can additionally try to sort your correspondence.txt by length, putting the longer patterns first:

awk '{ print length($1), $0 | "sort -n -r" }' < correspondence.txt

7 5275_AB MCO7
7 5275_AA MCO6
6 5275_B  MDF6
6 5275_A  MDF3

If you use this file then with

while read m n k; do ...

it should process the longest and thus hopefully most specific patterns first and already have replaced those before the more generic patterns are processed.

ADD REPLY

Login before adding your answer.

Traffic: 1493 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6