Question: (Closed) How To rename multiple files by removing different extenssion from command line or with perl script
1
gravatar for adeena_hassan
20 months ago by
adeena_hassan40 wrote:

I have multiple file like below in linux

> Aardvark_GENES_D.fa.1
> Aardvark_GENES_D.fa.2 
> Aardvark_GENES_D.fa.3
> Aardvark_GENES_D.fa.4

I want to rename them by removing last extension and editing string like below

Aardvark_ACMSD_D.fa
Aardvark_ARID1B_D.fa
Aardvark_CRYM_D.fa
Aardvark_SMO_D.fa

Kindly help me how to do so ?????

ADD COMMENTlink modified 20 months ago by st.ph.n2.4k • written 20 months ago by adeena_hassan40

Who do you know which ones to rename to CRYM or ACMSD

ADD REPLYlink written 20 months ago by geek_y9.4k

first one for ACMSD, Second one for ARID1B, Third one for CRYM, this will be in the same order

ADD REPLYlink written 20 months ago by adeena_hassan40
1

Your fourth file is named SMO... You need to clearly define the rule you want to use to rename your files.

ADD REPLYlink written 20 months ago by James Ashmore2.6k

Does numbers at the end of your files correspond to the order of gen names in a separate list?

ADD REPLYlink written 20 months ago by cpad011211k

Hello adeenahassan77!

This is a not a bioinformatics question and multiple solution have already been posted.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink modified 20 months ago • written 20 months ago by genomax65k
3
gravatar for Pierre Lindenbaum
20 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

assuming list.txt contains the name of the gene in the correct order

paste  <(ls  *.fa.* | sort -t '.' -k3,3n ) list.txt  | awk  '{split($1,a,/_/); printf("mv %s %s_%s_%s\n",$1,a[1],$2,a[3]);}'  | sed 's/\.[0-9]*$//'

mv Aardvark_GENES_D.fa.1 Aardvark_ACMSD_D.fa
mv Aardvark_GENES_D.fa.2 Aardvark_ARID1B_D.fa
mv Aardvark_GENES_D.fa.3 Aardvark_CRYM_D.fa
mv Aardvark_GENES_D.fa.4 Aardvark_SMO_D.fa

when you're happy with the result, pipe it into 'bash'

ADD COMMENTlink written 20 months ago by Pierre Lindenbaum119k

Hi, I have 100 folders and in each folder i have multiple files as mention above (Prefix Aardvark change from folder to folder). I'm using above code it works well for Aardvark bt not for other folder in which prefix contains multiple words e.g for African_savana_elephant it adds Genes name after african and no extension at the end just like below

> African_ACMSD_Savana 
>African_ARID1B_Savana 
>African_CRYM_Savana
> African_SMO_Savana

where it shoud be like below:

African_savana_elephant_ACMSD_D.fa 
African_savana_elephant_ARID1B_D.fa 
African_savana_elephant_SMO_D.fa
ADD REPLYlink written 20 months ago by adeena_hassan40
4
gravatar for ptinto
20 months ago by
ptinto190
ptinto190 wrote:

in any recent linux, the program rename is a perl script that uses perl regular expressions for renaming files but depending how complicated is to select the genes would be difficult to do this using a one-liner.

The simpler and involving less coding solution if you need to do it quick and dirty is as follow:

[update] I though your gene mapping was different. If you have only one gene per fasta and you have to files in same order for file and gene:

$ cat genes
ACMSD
ARID1B
CRYM
SMO

$ cat files.fof
Aardvark_GENES_D.fa.1
Aardvark_GENES_D.fa.2
Aardvark_GENES_D.fa.3
Aardvark_GENES_D.fa.4

Put both files in a single file each colum side by side (without excel)

$ paste files.fof genes
Aardvark_GENES_D.fa.1   ACMSD
Aardvark_GENES_D.fa.2   ARID1B
Aardvark_GENES_D.fa.3   CRYM
Aardvark_GENES_D.fa.4   SMO

Now is as simple as substitute GENES for the gene in the second column

$ paste files.fof genes | perl -lane '$gene=$F[1]; $file=$F[0]; $file=~s/\.\d+$//; ($new_name=$file)=~s/GENES/$gene/;print "mv $file $new_name"'

mv Aardvark_GENES_D.fa Aardvark_ACMSD_D.fa
mv Aardvark_GENES_D.fa Aardvark_ARID1B_D.fa
mv Aardvark_GENES_D.fa Aardvark_CRYM_D.fa
mv Aardvark_GENES_D.fa Aardvark_SMO_D.fa

For reproducibility and accountability, I like to have these commnads explicitly printed to a file and then sh them. Sometimes you don't get your script right at the first attempt, and also is good to have the commands run just for further review in case something is wrong latter in the pipeline.

=============

[old code left here just for historic purposes]

Assuming you have 8 files (two sets of 4 genes to rename)

$ perl -lane 'BEGIN{@x=qw(ACMSD ARID1B CRYM SMO)x2}; $gene=$x[$.-1];($new_name=$_)=~s/GENES/$gene/;print "mv $_ $new_name"' <(ls *.fa*) > rename_files.sh

Check the generated script (better safe than sorry)

$ cat rename_files.sh
mv Aardvark_GENES_D.fa.1 Aardvark_ACMSD_D.fa.1
mv Aardvark_GENES_D.fa.2 Aardvark_ARID1B_D.fa.2
mv Aardvark_GENES_D.fa.3 Aardvark_CRYM_D.fa.3
mv Aardvark_GENES_D.fa.4 Aardvark_SMO_D.fa.4
mv Aardvark_GENES_D.fa.5 Aardvark_ACMSD_D.fa.5
mv Aardvark_GENES_D.fa.6 Aardvark_ARID1B_D.fa.6 
mv Aardvark_GENES_D.fa.7 Aardvark_CRYM_D.fa.7
mv Aardvark_GENES_D.fa.8 Aardvark_SMO_D.fa.8

And execute it

$ sh rename_files.fh
ardvark_ACMSD_D.fa.1
Aardvark_ACMSD_D.fa.5
Aardvark_ARID1B_D.fa.2
Aardvark_ARID1B_D.fa.6
Aardvark_CRYM_D.fa.3
Aardvark_CRYM_D.fa.7
Aardvark_SMO_D.fa.4
Aardvark_SMO_D.fa.8

I have left the number at the end to show that it works that intended

The real script to remove the number should use the regexp s/GENES(.+).\d$/$gene$1/

## $1 contains the text captured by the (.+) capturing block in the first part of the substitution regexp

 perl -lane 'BEGIN{@x=qw(ACMSD ARID1B CRYM SMO)x2}; $gene=$x[$.-1]; ($new_name=$_)=~s/GENES(.+).\d$/$gene$1/;print "mv $_ $new_name"' <(ls *.fa.*) 
mv Aardvark_GENES_D.fa.1 Aardvark_ACMSD_D.fa
mv Aardvark_GENES_D.fa.2 Aardvark_ARID1B_D.fa
mv Aardvark_GENES_D.fa.3 Aardvark_CRYM_D.fa
mv Aardvark_GENES_D.fa.4 Aardvark_SMO_D.fa
mv Aardvark_GENES_D.fa.5 Aardvark_ACMSD_D.fa
mv Aardvark_GENES_D.fa.6 Aardvark_ARID1B_D.fa
mv Aardvark_GENES_D.fa.7 Aardvark_CRYM_D.fa
mv Aardvark_GENES_D.fa.8 Aardvark_SMO_D.fa

.

==Working Example==

# simulate a list of files
$ touch Aardvark_GENES_D.fa.1 Aardvark_GENES_D.fa.2 Aardvark_GENES_D.fa.3 Aardvark_GENES_D.fa.4 Aardvark_GENES_D.fa.5 Aardvark_GENES_D.fa.6

# create a 'file of files' .fof file to iterate over them programatically (is better to do it with 'find' but this case is simply enough to use ls)

$ ls *.fa.* > files.fof

# how to create a mapping from file to genes
# - create your gene list and multiply (x operator) as many times as blocks of fa you have 
#   (you said that they were in blocks of 4 always in the same order)
# - use the $. (line number) to access to the array index of periodic gene names
# - create the command mv
$ perl -lane 'BEGIN{@x=qw(ACMSD ARID1B CRYM)x2}; $gene=$x[$.-1]; print "$_ -> $gene"' files.fof
Aardvark_GENES_D.fa.1 -> ACMSD
Aardvark_GENES_D.fa.2 -> ARID1B
Aardvark_GENES_D.fa.3 -> CRYM
Aardvark_GENES_D.fa.4 -> ACMSD
Aardvark_GENES_D.fa.5 -> ARID1B
Aardvark_GENES_D.fa.6 -> CRYM


# substitute the names. In perl in order to do a substitution and asign it to a new variable, you need first to create the copy and then substitute the string. If you want to do it all at once, you need to put in parents the assignment ($new_var=$ori_var), and then do the substitution `($new_var=$ori_var)=~/s/a/b/`

$ perl -lane 'BEGIN{@x=qw(ACMSD ARID1B CRYM)x2}; $gene=$x[$.-1]; ($new_name=$_)=~s/GENES/$gene/;print "mv $_ $new_name"' files.fof > rename_files.sh
ADD COMMENTlink modified 20 months ago • written 20 months ago by ptinto190

touch accepts regex. For creating, 5 files below:

Aardvark_GENES_D.fa.1
Aardvark_GENES_D.fa.2
Aardvark_GENES_D.fa.3
Aardvark_GENES_D.fa.4
Aardvark_GENES_D.fa.5

code:

$touch Aardvark_GENES_D.fa.{1..5}
ADD REPLYlink modified 20 months ago • written 20 months ago by cpad011211k

thanks, I always forget the useful bash expansions.

ADD REPLYlink written 20 months ago by ptinto190
1
gravatar for cpad0112
20 months ago by
cpad011211k
India
cpad011211k wrote:
$   cat genes.txt 
    ACMSD
    ARID1B
    CRYM
    SMO

Create 4 dummy files:

  $ touch Aardvark_GENES_D.fa.{1..4}

output from:

 $  ls Aardvark_GENES_D.fa.*
    Aardvark_GENES_D.fa.1  Aardvark_GENES_D.fa.2  Aardvark_GENES_D.fa.3  Aardvark_GENES_D.fa.4

Command:

 $ paste <(ls *.fa.*) <(ls *.fa* | paste - genes.txt | awk '{gsub("GENES",$2);gsub(".[0-9]",""); print$1}') | xargs -n2 cp

Instead of moving, files were copied. one can use mv to rename the file instead of cp. output from ls:

  $  ls Aardvark_*
    Aardvark_ACMSD_D.fa  Aardvark_CRYM_D.fa     Aardvark_GENES_D.fa.2  Aardvark_GENES_D.fa.4
    Aardvark_ARIB_D.fa   Aardvark_GENES_D.fa.1  Aardvark_GENES_D.fa.3  Aardvark_SMO_D.fa
ADD COMMENTlink modified 20 months ago • written 20 months ago by cpad011211k
1
gravatar for st.ph.n
20 months ago by
st.ph.n2.4k
Philadelphia, PA
st.ph.n2.4k wrote:

genes.txt (tab-delimited):

1 ACMSD
2 ARID1B
3 CRYM
4 SMO

#!/usr/bin env python

import os, glob

with open('genes.txt', 'r') as f:
    for line in f:
        g = [line.strip().split('\t')[0]] = line.strip().split('\t')[1]

for file in glob.glob('*.fa.*'):
    os.rename(file.split('_')[0] + '_' + genes[file.split('.')[-1]] + '_' + file.split('_')[2].split('.')[0] + '.fa')
ADD COMMENTlink written 20 months ago by st.ph.n2.4k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1634 users visited in the last hour