TCGA Filename Breakdown
3.3 years ago
stanley.ju • 0

What do the different components of a TCGA data filename mean?

For example, here's one file I was looking at: HORNS_p_TCGA_b110_113_SNP_N_GenomeWideSNP_6_C10_772388.grch38.seg.v2.txt

Some parts are self-explanatory. This comes from a genome-wide SNP array, I assume the 6 is Affymetrix 6.0. But what does "HORNS" mean? And "b110"? Etc.

tcga copy number variation genome
Where did you get this file from? Also, b110 is incomplete, I think: it should be considered with the 113 that follows, b110_113.

I see--so b110_113 is some sort of sample marker?

This particular file name came from a download from TCGA Data Portal --> Uterine Corpus Endometrial Carcinoma --> Copy Number Variation. It was just the first file in the archive after I downloaded (straight from the web, since they're pretty small) all of the CNV data for endometrial carcinoma.

This paper might be of help, but I don't know how useful it is to decipher TCGA filenames.

HORNS could be code for an institute of sample origin or something like that; I wouldn't worry too much about it.