Question: Cancer data download with normal sample
0
gravatar for mhasa006
29 days ago by
mhasa00630
United States
mhasa00630 wrote:

I want to download cancer dataset that contains some normal (control) samples too. Right now I can download data from BoradInstitute but I believe the dataset only contains cancerous patients information. For example, I need to download Breast cancer methylation dataset where I can have methylation state of both cancer patient and normal sample. Can anyone give me direction for downloading such data?

cancer cancer data tcga normal • 146 views
ADD COMMENTlink modified 28 days ago by Kevin Blighe37k • written 29 days ago by mhasa00630
2
gravatar for Kevin Blighe
28 days ago by
Kevin Blighe37k
Republic of Ireland
Kevin Blighe37k wrote:

There are definitely normal methylation samples for BRCA (and other cancers in TCGA). The file that you likely want from HERE is called gdac.broadinstitute.org_BRCA.Methylation_Preprocess.Level_3.2016012800.0.0.tar.gz. This contains the normalised methylation beta (β) values for all BRCA samples. Within the downloaded tar.gz file, these normalsied values are stored in the file BRCA.meth.by_mean.data.txt.

Here's what this contains:

Hybridization REF     TCGA-3C-AAAU-01   TCGA-3C-AALI-01 TCGA-3C-AALJ-01 TCGA-3C-AALK-01
Composite Element REF Beta_Value        Beta_Value      Beta_Value      Beta_Value
A1BG                  0,4837161197      0,6371912261    0,6560923982    0,6151944714
A1CF                  0,2958272035      0,4589729986    0,4897252896    0,6257652232
A2BP1                 0,1876998696      0,2405158477    0,2790878512    0,4888885105
A2LD1                 0,6295855132      0,6662722887    0,7556305       0,7457512129
A2M                   0,5596536616      0,6075048697    0,662360104     0,7279819032
A2ML1                 0,8354122558      0,8423905626    0,8290204696    0,8353647495
A4GALT                0,4848002856      0,5500468176    0,4761066997    0,5560164612
A4GNT                 0,6902167979      0,7498897004    0,6537560771    0,6520049818
AAA1                  0,8078049982      0,3952904662    0,7951015988    0,8164225961
AAAS                  0,1373555163      0,0561035293    0,0677840275    0,0630021045

[NB - I'm in a Latin locale, so, my decimal points are commas]

With regard to how to identify the normals, in this case it is easy. Look at the TCGA barcodes in this file:

head -1 BRCA.meth.by_mean.data.txt | sed 's/\t/\n/g' | head -10
Hybridization REF
TCGA-3C-AAAU-01
TCGA-3C-AALI-01
TCGA-3C-AALJ-01
TCGA-3C-AALK-01
TCGA-4H-AAAK-01
TCGA-5L-AAT0-01
TCGA-5L-AAT1-01
TCGA-5T-A9QA-01
TCGA-A1-A0SB-01

The final number in these [shortened] TCGA barcodes indicate the tissue/sample type:

  • 01 to 09 = tumour
  • 10 to 19 = normal
  • 20 to 29 = control

See here: Meaning letters in TCGA sample barcode field

How many are there? - even just looking for the 11 sample code, we can see that there are at least 97 normal methylation samples in this dataset:

head -1 BRCA.meth.by_mean.data.txt | sed 's/\t/\n/g' | grep -e "11$"  | wc -l
97

Kevin

ADD COMMENTlink modified 28 days ago • written 28 days ago by Kevin Blighe37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1546 users visited in the last hour