Tool:UUIDtoBarcode Renaming of Folders
1
1
Entering edit mode
16 months ago
Kaia ▴ 10

I've recently been involved in a research project that involves taking datasets from the National Cancer Institute, GDC Data Portal. The Methylation and RNA-seq dataset folders all have UUID names and we needed TCGA names. I've created some code that will go through and rename all of the sub-folders. I've seen people with similar problems so I've attached the code so it could possibly help others. I can share the whole R file if needed with libraries, packages, and comments, just let me know. How this helps!

old_files <- list.files(path="./RNA-Seq", pattern=NULL, full.names=TRUE) 

for (file in old_files) {
  file = sub("./RNA-Seq/","",file)
  barcode = UUIDtoBarcode(file, from_type = "file_id")
  new_files <- paste0("./RNA-Seq/",barcode[[2]]) 
  file_name = paste("./RNA-Seq/",file,sep="") 
  file.rename(from = file_name, to = new_files) 
}

list.files(path="./RNA-Seq", pattern=NULL, all.files=FALSE,full.names=FALSE)
UUIDtoBarcode R • 704 views
ADD COMMENT
0
Entering edit mode

I can share the whole R file if needed with libraries, packages, and comments, just let me know.

May want to consider putting that up at GitHub on in gist and then paste the link in your post. Biostars code will automatically parse gist links.

ADD REPLY
0
Entering edit mode
16 months ago
Zhenyu Zhang ★ 1.2k

Please be careful on these conversions. These sample barcodes are not cross-GDC unique. They are unique only within any particular project. It is fine for TCGA, but you might find duplicates among some other projects. That's why GDC uses UUIDs instead of barcodes as identifiers. Similarly, gene ensemble ids are suggested to carry the analysis, and to switch to gene names at final presentation stage.

ADD COMMENT

Login before adding your answer.

Traffic: 2564 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6