Working with GTEX dataset
2
3
Entering edit mode
8.8 years ago
pbio ▴ 150

I work with RNA-seq data and have found a few deferentially expressed genes across particular tissue sample. Now I have been instructed to work with GTEX data to see the deferentially expressed genes across different tissue samples.

Now to go green with GTEX data set, I first don't understand their sample codes like

GTEX-N7MS-0007-SM-2D7W1 for which tissue?

I tried to search the bar codes for GTEX but haven't found any. Please can anyone give some idea for decoding the GTEX bar codes and also performing such analysis. I am sorry if this question is silly because I am a complete naive in the field of NGS.

GTEX RNA-Seq • 9.8k views
ADD COMMENT
0
Entering edit mode

The sample ID for an RNA-Seq or genotype sample is made up of the following 3 components separated by a dash, as exemplified with the example GTEX-14753-1626-SM-5NQ9L:

GTEX-YYYYY (e.g., GTEX-14753) represents the GTEx donor ID. This ID should be used to link between the various RNA-Seq and genotype samples that come from the same donor. YYYY (e.g., "1626") mostly refers to the tissue site, BUT we do not recommend using it for tissue site designation. Sometimes sample mix-ups occur, and will be corrected however this part of the ID will not change when that happens. The accurate tissue site designation for all samples can be obtained from the "Tissue Site Detail field" (encoded as "SMTSD") in the Sample Attributes file [Datasets->Download->GTEx_Data_V6_Annotations_SampleAttributesDS.txt]. SM-YYYYY (e.g., SM-5NQ9L) is the RNA or DNA aliquot ID used for sequencing. Y stands for any number or capital letter.

SOURCE: https://sites.google.com/broadinstitute.org/gtex-faqs/home

ADD REPLY
0
Entering edit mode

Dear guys,

May I consult the file phs000424.v7.pht002742.v7.p2.c1.GTEx_Subject_Phenotypes.GRU.txt is the same as https://storage.googleapis.com/gtex_analysis_v7/annotations/GTEx_v7_Annotations_SubjectPhenotypesDS.txt, as for the phenotypes file, no file named like phsXXX, or this file belongs to the protected file?

Thank you very much for your guidance! Best!

ADD REPLY
7
Entering edit mode
8.8 years ago

In the download section, where it says "A de-identified, open access version of the sample annotations available in dbGaP.", you should find a file called GTEx_Data_V4_Annotations_SampleAttributesDS.txt, containing the annotation of each sample. For example GTEX-N7MS-0007-SM-2D7W1 is from Whole Blood.

ADD COMMENT
0
Entering edit mode

Thankyou...

ADD REPLY
1
Entering edit mode

I have a question, when I got the file GTEx_Data_V4_Annotations_SampleAttributesDS.txt, it showed there are 1822 Whole blood samples, then I matched these sample sites with "Gene RPKM" data from "datasets"-download. Strinkingly, there were only 393 samples intersect. In addition,I find in the datasets - Summary Statistics,there also shows 393 total samples of whole blood.

I wonder why the file GTEx_Data_V4_Annotations_SampleAttributesDS.txt gives more samples counts?

ADD REPLY
0
Entering edit mode

Did you ever figure the solution to this?

I'm working with GTEx Adipose(v7) - Subcutaneous and have several missing a sample and individual attributes:

  • Adipose - Subcutaneous=294 samples in the genotypes
  • Adipose - Subcutaneous=294 samples in the expression values
  • Adipose - Subcutaneous=138 samples in the Sample_Attributes file (phs000424.v7.pht002743.v7.p2.c1.GTEx_Sample_Attributes.GRU.txt)
  • Adipose - Subcutaneous=117 samples in the Subject_Phenotypes file (phs000424.v7.pht002742.v7.p2.c1.GTEx_Subject_Phenotypes.GRU.txt)

Thanks!

ADD REPLY
0
Entering edit mode

https://sites.google.com/broadinstitute.org/gtex-faqs/home

The sample ID for an RNA-Seq or genotype sample is made up of the following 3 components separated by a dash, as exemplified with the example GTEX-14753-1626-SM-5NQ9L:

GTEX-YYYYY (e.g., GTEX-14753) represents the GTEx donor ID. This ID should be used to link between the various RNA-Seq and genotype samples that come from the same donor. YYYY (e.g., "1626") mostly refers to the tissue site, BUT we do not recommend using it for tissue site designation. Sometimes sample mix-ups occur, and will be corrected however this part of the ID will not change when that happens. The accurate tissue site designation for all samples can be obtained from the "Tissue Site Detail field" (encoded as "SMTSD") in the Sample Attributes file [Datasets->Download->GTEx_Data_V6_Annotations_SampleAttributesDS.txt]. SM-YYYYY (e.g., SM-5NQ9L) is the RNA or DNA aliquot ID used for sequencing. Y stands for any number or capital letter.

ADD REPLY

Login before adding your answer.

Traffic: 3101 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6