Reproduction GTex Box plot
3
0
Entering edit mode
6.0 years ago
peris ▴ 120

Hi

I need to regenerate the expression boxplot graph for five of mine gene of interest. I can see the expression graph in GTex portal but I cant export the image. So, I have downloaded the RNAseq data from GTex portal. Is there any tool though which I can regenerate the box plot? Thanx and regards

RNA-Seq GTex BoxPlot Gene expression R • 4.6k views
ADD COMMENT
7
Entering edit mode
6.0 years ago

To access GTEx data you need to register first on their website.

In the data download page, you are interested in the following files:

  • GTEx_Data_V6_Annotations_SampleAttributesDS.txt -> table with tissue of origin and site of each sample
  • GTEx_Analysis_v6_RNA-seq_RNA-SeQCv1.1.8_gene_rpkm.gct.gz -> RPKM expression for every sample/gene (note: it's a big file)

For the boxplot, you will have to adapt the following code. It uses dplyr, tidyr and ggplot2. Hope you are familiar with these libraries :-)

# preparing data
> library(dplyr)
> library(tidyr)
> library(ggplot2)
> samples = read.delim('GTEx_Data_V6_Annotations_SampleAttributesDS.txt', sep='\t') %>%
    select(SAMPID, primary.tissue=SMTS, tissue=SMTSD)
> gtex = read.table('GTEx_Analysis_2014-01-17_RNA-seq_RNA-SeQCv1.1.8_gene_rpkm.gct', skip=2, colClasses=c('character', 'character', rep('numeric', 2921)), stringsAsFactors=F, header=T)  # go take a coffee!
> gtex.bysample = expdata %>% 
   gather(SAMPID, expression, -Name, -Description) %>% 
   mutate(SAMPID=gsub('\\.', '-', SAMPID)) %>% 
   left_join(samples)

At this point you will have a dataframe with one line for every gene and sample:

> gtex.bysample
               Name Description                  SAMPID expression primary.tissue      tissue
1 ENSG00000223972.4     DDX11L1 GTEX-N7MS-0007-SM-2D7W1    0.00000          Blood Whole Blood
2 ENSG00000227232.4      WASH7P GTEX-N7MS-0007-SM-2D7W1    2.95098          Blood Whole Blood
3 ENSG00000243485.2  MIR1302-11 GTEX-N7MS-0007-SM-2D7W1    0.00000          Blood Whole Blood
4 ENSG00000237613.2     FAM138A GTEX-N7MS-0007-SM-2D7W1    0.00000          Blood Whole Blood
5 ENSG00000268020.2      OR4G4P GTEX-N7MS-0007-SM-2D7W1    0.00000          Blood Whole Blood
6 ENSG00000240361.1     OR4G11P GTEX-N7MS-0007-SM-2D7W1    0.00000          Blood Whole Blood

You can filter by your gene and tissues of interest:

runx3.blood_breast = gtex.bysample %>% filter(Description=='RUNX3', primary.tissue %in% c("Blood", "Breast"))

You can plot it with ggplot2:

runx3.blood_breast %>%
    ggplot(aes(x=primary.tissue, y=expression)) +
        geom_boxplot()

In alternative, in NCG we also provide a summary of Gene expression for cancer genes:

ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks Giovanni.

Is there a typo in

gtex.bysample = expdata %>% 
   gather(SAMPID, expression, -Name, -Description) %>% 
   mutate(SAMPID=gsub('\\.', '-', SAMPID)) %>% 
   left_join(samples)

where "expdata" should be "gtex" and this step takes forever?

ADD REPLY
0
Entering edit mode

@Giovanni M Dall'Olio

How to generate genotype specific plots for an RSID

Say rs1998081 or rs 2567619 as done in this

enter image description here

ADD REPLY
0
Entering edit mode
6.0 years ago
h.mon 33k

R can generate box-plots, and I am certain it could create a suitable box-plot for you. But for a more specific answer, you should provide more details, e.g. a small example of how the data is organized.

ADD COMMENT
0
Entering edit mode
6.0 years ago
lkmklsmn ▴ 950
In order to generate the boxplot you need both the expression and genotype data. I think the genotype data for gtex is available in dbGaP.
ADD COMMENT

Login before adding your answer.

Traffic: 2206 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6