How To Get The Number Of Samples And Sample Tcga Id'S That Are Reported In Tcga For Brca?
2
1
Entering edit mode
10.2 years ago
Jordan ★ 1.3k

Hi,

I would like to know the number of samples that are reported for a particular cancer type. In my case BRCA. I can just get the number from the website link here. But I also need the sample id's.

I tried downloading the biotab files from Data Matrix. But there are too many files and I'm not sure which ones to use. And also the number of samples in theses files don't match the number in TCGA website except for a file with the name nationwidechildrens.org_biospecimen_brca_cqcf.txt. I'm not sure if I can use this and I don't know what cqcf means except the full form - case quality control form.

I'm only looking for sample names and not interested in it being tumor or normal. The ID's I'm looking for are like this: TCGA-AP-A1DH

Can anyone help with this? Thanks!

tcga • 4.6k views
ADD COMMENT
2
Entering edit mode
10.2 years ago
  1. Grab any of the clincal files from the BCR:

    wget https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/brca/bcr/biotab/clin/biospecimen_sample_brca.txt
    
  2. grep out the sample names:

    cut -f 1-3 -d - biospecimen_sample_brca.txt | tail -n +2 | sort | uniq | wc -l
    1056
    
  3. Keep in mind, this is the number of samples with clinical data currently available through the portal. It is likely to be higher than the number of cases with data from some or all types. Sample accrual is still going on in many cases and it takes time to go from the DNA processing centers to the centers that will perform the assay, then for the data to be generated, QCed, and released.

ADD COMMENT
0
Entering edit mode

Thanks that helped!

ADD REPLY
1
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 1608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6