Working with GTEX dataset
2
3
Entering edit mode
7.1 years ago
pbio ▴ 150

I work with RNA-seq data and have found a few deferentially expressed genes across particular tissue sample. Now I have been instructed to work with GTEX data to see the deferentially expressed genes across different tissue samples.

Now to go green with GTEX data set, I first don't understand their sample codes like

GTEX-N7MS-0007-SM-2D7W1 for which tissue?

I tried to search the bar codes for GTEX but haven't found any. Please can anyone give some idea for decoding the GTEX bar codes and also performing such analysis. I am sorry if this question is silly because I am a complete naive in the field of NGS.

RNA-Seq GTEX • 8.7k views
0
Entering edit mode

The sample ID for an RNA-Seq or genotype sample is made up of the following 3 components separated by a dash, as exemplified with the example "GTEX-14753-1626-SM-5NQ9L":

"GTEX-YYYYY" (e.g. GTEX-14753) represents the GTEx donor ID. This ID should be used to link between the various RNA-Seq and genotype samples that come from the same donor. "YYYY" (e.g., "1626") mostly refers to the tissue site, BUT we do not recommend using it for tissue site designation. Sometimes sample mix-ups occur, and will be corrected however this part of the ID will not change when that happens. The accurate tissue site designation for all samples can be obtained from the "Tissue Site Detail field" (encoded as "SMTSD") in the Sample Attributes file [Datasets->Download->GTEx_Data_V6_Annotations_SampleAttributesDS.txt]. "SM-YYYYY" (e.g., SM-5NQ9L) is the RNA or DNA aliquot ID used for sequencing. 'Y' stands for any number or capital letter.

0
Entering edit mode

Dear guys,

May I consultant the file phs000424.v7.pht002742.v7.p2.c1.GTEx_Subject_Phenotypes.GRU.txt is the same as https://storage.googleapis.com/gtex_analysis_v7/annotations/GTEx_v7_Annotations_SubjectPhenotypesDS.txt , as for the phenotypes file, no file named like phsXXX, or this file belongs to the protected file?

Thank you very much for your guidance! Best!

7
Entering edit mode
7.1 years ago

In the download section, where it says "A de-identified, open access version of the sample annotations available in dbGaP.", you should find a file called GTEx_Data_V4_Annotations_SampleAttributesDS.txt, containing the annotation of each sample. For example GTEX-N7MS-0007-SM-2D7W1 is from Whole Blood.

0
Entering edit mode

Thankyou...

1
Entering edit mode

I have a question,when I got the file "GTEx_Data_V4_Annotations_SampleAttributesDS.txt",it showed there are 1822 Whole blood samples, then I matched these sample sites with "Gene RPKM" data from "datasets"-download.Strinkingly, there were only 393 samples intersect. In addition,I find in the datasets - Summary Statistics,there also shows 393 total samples of whole blood. I wonder why the file "GTEx_Data_V4_Annotations_SampleAttributesDS.txt" gives more samples counts?

0
Entering edit mode

Did you ever figure the solution to this? I'm working with GTEx Adipose(v7) - Subcutaneous and have several missing a sample and individual attributes: Adipose - Subcutaneous=294 samples in the genotypes Adipose - Subcutaneous=294 samples in the expression values Adipose - Subcutaneous=138 samples in the Sample_Attributes file (phs000424.v7.pht002743.v7.p2.c1.GTEx_Sample_Attributes.GRU.txt) Adipose - Subcutaneous=117 samples in the Subject_Phenotypes file (phs000424.v7.pht002742.v7.p2.c1.GTEx_Subject_Phenotypes.GRU.txt) Thanks!

0
Entering edit mode

The sample ID for an RNA-Seq or genotype sample is made up of the following 3 components separated by a dash, as exemplified with the example "GTEX-14753-1626-SM-5NQ9L":

"GTEX-YYYYY" (e.g. GTEX-14753) represents the GTEx donor ID. This ID should be used to link between the various RNA-Seq and genotype samples that come from the same donor. "YYYY" (e.g., "1626") mostly refers to the tissue site, BUT we do not recommend using it for tissue site designation. Sometimes sample mix-ups occur, and will be corrected however this part of the ID will not change when that happens. The accurate tissue site designation for all samples can be obtained from the "Tissue Site Detail field" (encoded as "SMTSD") in the Sample Attributes file [Datasets->Download->GTEx_Data_V6_Annotations_SampleAttributesDS.txt]. "SM-YYYYY" (e.g., SM-5NQ9L) is the RNA or DNA aliquot ID used for sequencing. 'Y' stands for any number or capital letter.