Hoping someone can help with what is a trivial problem. I receive Ion torrent data by subject and the files I get have the barcode name in all of the filenames by default. I want to rename all of the file names with a sampleID instead of this barcode id... I'm new to perl so I'm struggling with a script to rename the bam or vcf files. The start of my script is below, but I feel like I'm complicating things. I basically thought cd into the directory containing all vcfs or bams and run this script with the appropriate flag but I'm stuck at the point below:
#!/usr/bin/perl
# ion_rename.pl
# rename_script - Script to rename a batch of iontorrent files that contain IonXpress barcodes to their study ID
# will rename all vcf or all bam files and their associated index files
use strict; use warnings; use Getopt::Long;
my $VERSION = "1.0"; # it's a good idea to version your programs
my ($vcf,$bam,$help);
my $barcodefile = '';
my $dir = getcwd;
GetOptions( 'vcf' => \$vcf,
'bam' => \$bam,
'barcodefile=s' => \$barcodefile,
'help' => \$help);
my $usage = "
usage: ion_rename.pl [options] <arguments...>
options:
--help
--vcf Use this option for renaming ion vcf files
--bam Use this option for renaming ion bam files
--barcodefile <file containing link between barcode and sampleID>
";
@files;
if ($help) {
print "version ", $VERSION, "\n";
print $usage; # it's common to provide a -h to give help
exit;
}
elsif($barcodefile eq ""){
print "Must specify barcode file!\n";
print $usage; # it's common to provide a -h to give help
exit;
}
else{
if($vcf){
@files = glob '*.vcf*';
elsif($bam){
@files = glob '*.bam';
}
open($barcodefile, "<", "input.txt") # read from file
or die "error reading $barcodefile for reading";
while (<$barcodefile>) { #read line by line
chomp; #remove newline at end
($barcodename, $samplename) = split("\t");
}
}
My bam files are all named like this:
Run_1_011314.TAG_RG_N8FU5.IonXpress_054.bam
Run_1_011314.TAG_RG_N8FU5.IonXpress_082.bam
vcf files are all named like this:
TSVC_variants_IonXpress_082.vcf.gz
TSVC_variants_IonXpress_054.vcf.gz
TSVC_variants_IonXpress_082.vcf.gz.tbi
TSVC_variants_IonXpress_054.vcf.gz.tbi
I've created a tab-delimited file to map barcode to sample ID like this:
IonXpress_082 A13901
IonXpress_054 A21064
and I want to rename the files like this:
A13901.vcf.gz
A13901.vcf.gz.tbi
A21064.vcf.gz
A21064.vcf.gz.tbi
Run_1_011314.A21064.bam
Run_1_011314.A13901.bam
I'd strongly recommend AGAINST renaming files. Instead, create soft links to the original files using the desired names.
I agree, or store the old and new names in a database or text file. You should not rename raw data files; what if you have a query in the future for whoever generated them? They will not know which file you mean.
I often find that people think they need to rename things, for readability, when in fact what they need is more logical code for analysis pipelines.
I keep raw data files "as is" on our long-term storage server. And we keep appropriate databases for tracking data and individual sample IDs. It is somewhat easier to rename the local copies of initial data files for when they enter the analysis pipeline but it is more of a convenience thing at that point.
Is there any reason other than being able to go back to the original data source with the correct name? I would keep a record (my barcode to sampleID file) that I could use to trace a sampleID back to it's IonXpress ID.