Question: Renaming All Files In A Directory With A Perl Script
1
gravatar for Robert Sicko
5.1 years ago by
Robert Sicko550
United States
Robert Sicko550 wrote:

Hoping someone can help with what is a trivial problem. I receive Ion torrent data by subject and the files I get have the barcode name in all of the filenames by default. I want to rename all of the file names with a sampleID instead of this barcode id... I'm new to perl so I'm struggling with a script to rename the bam or vcf files. The start of my script is below, but I feel like I'm complicating things. I basically thought cd into the directory containing all vcfs or bams and run this script with the appropriate flag but I'm stuck at the point below:

#!/usr/bin/perl
# ion_rename.pl

# rename_script - Script to rename a batch of iontorrent files that contain IonXpress barcodes to their study ID
# will rename all vcf or all bam files and their associated index files

use strict; use warnings; use Getopt::Long;

my $VERSION = "1.0"; # it's a good idea to version your programs

my ($vcf,$bam,$help);
my $barcodefile = '';
my $dir = getcwd;

GetOptions( 'vcf'     => \$vcf,
        'bam' => \$bam,
        'barcodefile=s' => \$barcodefile,
        'help' => \$help);

my $usage = "
usage: ion_rename.pl [options] <arguments...>
options:
    --help
    --vcf Use this option for renaming ion vcf files
    --bam Use this option for renaming ion bam files
    --barcodefile <file containing link between barcode and sampleID>
";

@files; 

if ($help) {
    print "version ", $VERSION, "\n";
    print $usage; # it's common to provide a -h to give help
    exit;
}
elsif($barcodefile eq ""){
    print "Must specify barcode file!\n";
    print $usage; # it's common to provide a -h to give help
    exit;
}
else{
    if($vcf){
        @files = glob '*.vcf*';
    elsif($bam){
        @files = glob '*.bam';
    }

    open($barcodefile, "<", "input.txt")    # read from file
        or die "error reading $barcodefile for reading";
    while (<$barcodefile>) {    #read line by line
        chomp;            #remove newline at end
        ($barcodename, $samplename) = split("\t");
    }
}

My bam files are all named like this:

Run_1_011314.TAG_RG_N8FU5.IonXpress_054.bam
Run_1_011314.TAG_RG_N8FU5.IonXpress_082.bam

vcf files are all named like this:

TSVC_variants_IonXpress_082.vcf.gz
TSVC_variants_IonXpress_054.vcf.gz 
TSVC_variants_IonXpress_082.vcf.gz.tbi
TSVC_variants_IonXpress_054.vcf.gz.tbi

I've created a tab-delimited file to map barcode to sample ID like this:

IonXpress_082    A13901
IonXpress_054    A21064

and I want to rename the files like this:

A13901.vcf.gz
A13901.vcf.gz.tbi
A21064.vcf.gz
A21064.vcf.gz.tbi


Run_1_011314.A21064.bam
Run_1_011314.A13901.bam
perl linux • 9.2k views
ADD COMMENTlink modified 17 months ago by gradimir_sancanin0 • written 5.1 years ago by Robert Sicko550
2

I'd strongly recommend AGAINST renaming files. Instead, create soft links to the original files using the desired names.

ADD REPLYlink written 5.1 years ago by Sean Davis25k
1

I agree, or store the old and new names in a database or text file. You should not rename raw data files; what if you have a query in the future for whoever generated them? They will not know which file you mean.

I often find that people think they need to rename things, for readability, when in fact what they need is more logical code for analysis pipelines.

ADD REPLYlink written 5.1 years ago by Neilfws48k

I keep raw data files "as is" on our long-term storage server. And we keep appropriate databases for tracking data and individual sample IDs. It is somewhat easier to rename the local copies of initial data files for when they enter the analysis pipeline but it is more of a convenience thing at that point.

ADD REPLYlink written 5.1 years ago by Dan Gaston7.1k

Is there any reason other than being able to go back to the original data source with the correct name? I would keep a record (my barcode to sampleID file) that I could use to trace a sampleID back to it's IonXpress ID.

ADD REPLYlink written 5.1 years ago by Robert Sicko550
4
gravatar for Christian
5.1 years ago by
Christian2.7k
Cambridge, US
Christian2.7k wrote:

No need for a custom script. Look up the Linux 'rename' command. It allows you to specify a regular expression for the bulk renaming of files.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Christian2.7k
1

Agree. Have a look at the same question with answers on cross validated

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Irsan6.8k

Thanks... If I decide to rename instead of links I can probably hack together a bash script to read the new file names and rename using this bash script.

ADD REPLYlink written 5.1 years ago by Robert Sicko550

By the way, on Debian system (and probably Unbuntu too), Perl's rename command is being moved to the rename package. In future releases (Buster, ...) the perl package will not provide the rename command anymore.

ADD REPLYlink written 17 months ago by Charles Plessy2.6k
2
gravatar for Ryan D
5.1 years ago by
Ryan D3.3k
USA
Ryan D3.3k wrote:

For renaming to substitute one expression for another, I offer the following perl script.

But it just does simple renaming like so:

ryan@WZLINUX7:~/scripts$ perl renamer.pl

Old pattern: foo

New pattern: bar

File foo2 renamed to bar2

File foo3 renamed to bar3

File foo1 renamed to bar1

#!/usr/bin/perl -w

use strict;

my($dir, $oldpat, $newpat);
$dir=".";
print "Old pattern: ";
chomp($oldpat=<STDIN>);
print "New pattern: ";
chomp($newpat=<STDIN>);

opendir(DH, $dir) || die "Can not open $dir: $!";
my @files=readdir DH;
close(DH);

my $oldname;
foreach(@files){
   $oldname=$_;

   s/$oldpat/$newpat/; # change $_ to new pattern

   next if(-e "$dir/$_");
   if(! rename "$dir/$oldname", "$dir/$_"){
      warn "Could not rename $oldname to $_: $!";
   } else {
      print "File $oldname renamed to $_\n";
   }
}
ADD COMMENTlink written 5.1 years ago by Ryan D3.3k

Thanks. Saved this for future use.

ADD REPLYlink written 5.1 years ago by Robert Sicko550
1
gravatar for Dan Gaston
5.1 years ago by
Dan Gaston7.1k
Canada
Dan Gaston7.1k wrote:

Instead of putting barcodename and samplename in variables while you are temporarily looping through the barcode map file you should store them as key,value pairs in a hash, with the barcode being the key value.

Then you can loop through your files and match the barcode to the filename and rename accordingly using the value from the hash.

ADD COMMENTlink written 5.1 years ago by Dan Gaston7.1k

Thanks, interesting idea. I think this would make the logic of the program easier... I might be able to pull that off in C++ but in Perl it'd take me some doing.

ADD REPLYlink written 5.1 years ago by Robert Sicko550

It is even easier to do in Perl than in C++. Declare the hash and in the loop you currently have add:

$hash{$barcodename} = $samplename

After storing that mapping data loop through the file array.It looks easy enough to parse the filename if you know whether it is a bam or vcf as they look to follow regular patterns.

ADD REPLYlink written 5.1 years ago by Dan Gaston7.1k

Didn't mean to imply it'd be harder in Perl for someone who know's both languages... I just have far more experience with C++ so Perl is still a struggle for me. With that said, I'm trying to force myself to use Perl as I see the utility of it. Thanks, this will help!

ADD REPLYlink written 5.1 years ago by Robert Sicko550

No problem. The syntax for doing this in Perl is pretty straightforward. I tend to do all of these things in Python now myself.

ADD REPLYlink written 5.1 years ago by Dan Gaston7.1k

Just out of curiosity, is there a reason you prefer Python now or just personal preference?

ADD REPLYlink written 5.1 years ago by Robert Sicko550
2

Partially personal preference, I find the coding style cleaner. I also have found that in the genomics end of bioinformatics there are far more GOOD tools and libraries for Python compared to Perl. This includes pybedtools, bx.python (ClusterTrees, IntervalTrees), and others. As well the built in tools for handling common file types (delimited file formats, etc) are much better.

ADD REPLYlink written 5.1 years ago by Dan Gaston7.1k
1
gravatar for raphael.poujol
5.1 years ago by
Canada
raphael.poujol30 wrote:

I did n t test but this could be the way to do it if mapfile is the file with the correspondance.

mapfile=tab-delimited file to map barcode to sample;

for i in $(cut -f1 $mapfile); do
   j=$(grep ^$i $mapfile| cut -f2);
   inifile=$(ls | grep $i);
   echo mv $inifile $j$(echo $inifile | cut -f2-);
done

if you test it it will print all the command, when it s done remove the !echo! at the third line of the loop

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by raphael.poujol30

Cleaver... this will help if I go with renaming the files, thanks.

ADD REPLYlink written 5.1 years ago by Robert Sicko550
1
gravatar for Kenosis
5.1 years ago by
Kenosis1.2k
Kenosis1.2k wrote:

In case you're interestsed in creating renamed symbolic links to the original files, perhaps the following will be helpful:

use strict;
use warnings;
use File::Basename;

eval { symlink( "", "" ); 1 } or die "Symbolic links not supported.\n";

my ( @cols, %replacements );

while (<>) {
    $replacements{ $cols[0] } = $cols[1] if @cols = split;
}

for my $oldFile ( grep /bam$|vcf\.gz/, <./originals/*> ) {
    my $newFile = basename $oldFile;

    if ( my ( $tag, $ion ) = $oldFile =~ /\/Run.+(TAG.+)(IonXpress_\d+)/ ) {
        $newFile =~ s/$tag$ion/$replacements{$ion}/ if $replacements{$ion};
    }

    if ( my ($ion) = $oldFile =~ /\/TSVC_variants_(IonXpress_\d+)/ ) {
        $newFile =~ s/.+$ion/$replacements{$ion}/ if $replacements{$ion};
    }

    symlink $oldFile, $newFile
      or warn "Unable to create a symbolic link for '$oldFile': $!";
}

Usage: perl script.pl mapFile

Just place both script.pl and mapFile in a directory where there's a subdirectory called originals that contains all of your original files. The script will create symbolic links--named as you've specified--to those original files.

ADD COMMENTlink written 5.1 years ago by Kenosis1.2k

Thanks... If I go the link instead of the rename route, this will help!

ADD REPLYlink written 5.1 years ago by Robert Sicko550
0
gravatar for gradimir_sancanin
17 months ago by
gradimir_sancanin0 wrote:

This code can repair but you probably have problems when you try to rename some other type of files, better way is using software Batch Rename Files Tool. You can easily found hier BatchRenameFiles.org.

ADD COMMENTlink modified 16 months ago • written 17 months ago by gradimir_sancanin0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 659 users visited in the last hour