deseq2 normalization for Gtex datasets
1
1
Entering edit mode
20 months ago
rheab1230 ▴ 140

Hello, I have gene count and sample attributes file from gtex v8 portal. I want to first extract brain tissue gene count data only by using the samID coming from sample attributes.txt file. Once, I have gene count coming from brain cortex tissue, I want to normallize it using deseq2 package. This is what I have done:

GTEx_Analysis_v8_Annotations_SampleAttributesDS <- read.delim("GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt")
sample_attributes <- select(GTEx_Analysis_v8_Annotations_SampleAttributesDS,SAMPID,SMTS,SMTSD,SMAFRZE)

sample_attributes_braindata <- sample_attributes %>% filter(sample_attributes$SMTS == "Brain" & sample_attributes$SMAFRZE == "RNASEQ")
sample_attributes_braindata_cortex = sample_attributes_braindata[sample_attributes_braindata$SMTSD == "Brain - Cortex",]
dim(sample_attributes_braindata_cortex)
library(tidyverse)


GTEx_Analysis_gene_reads <- read_table2("GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_reads.gct")
GTEx_Analysis_gene_reads <- data.frame(GTEx_Analysis_gene_reads)
(GTEx_Analysis_gene_reads)[1:5,1:5]
dim(GTEx_Analysis_gene_reads)

gtex_gene_name = GTEx_Analysis_gene_reads$Name
gtex_gene_description = GTEx_Analysis_gene_reads$Description

sample_attributes_braindata_cortex$SAMPID = gsub(pattern = "-", replacement = ".", x = sample_attributes_braindata_cortex$SAMPID, fixed = TRUE)

index= colnames(GTEx_Analysis_gene_reads) %in% sample_attributes_braindata_cortex$SAMPID
sum(index)
GTEx_Analysis_gene_reads_cortex = GTEx_Analysis_gene_reads[,index]
rownames(GTEx_Analysis_gene_reads_cortex) = gtex_gene_name

#write.table(GTEx_Analysis_gene_reads_cortex, file = "GTEx_Analysis_gene_reads_cortex_unnorm.txt", sep="\t", quote=FALSE, col.names=NA)

#GTEx_Analysis_gene_reads_cortex = read.delim("GTEx_Analysis_gene_reads_cortex_unnorm.txt",row.names = 1)

sample_info<-data.frame(condition=colnames(GTEx_Analysis_gene_reads_cortex), row.names=colnames(GTEx_Analysis_gene_reads_cortex))
library(DESeq2)
ds<-DESeqDataSetFromMatrix(countData=GTEx_Analysis_gene_reads_cortex, colData=sample_info, design= ~condition)
keep_genes<-rowSums(counts(ds))>0
ds<-ds[keep_genes,]
ds<-estimateSizeFactors(ds)
normalized_counts<-counts(ds, normalized=TRUE)


write.table(normalized_counts,file="GTEx_Analysis_gene_reads_cortex_normalized.txt", sep= "\t", quote = F, row.names = F)

But the normalized gene count that I get from this is not normalized. There is some error in this. I don't have any condition in this case. I have used condtion here as colnames which represent the sampleID or GTEX-1117F. Should I use condition as gene names to normalize it? I am not familiar with GTEx data that much and need some guidance. Thank you.

gene GTEX expression deseq2 tissue normalization brain • 941 views
ADD COMMENT
1
Entering edit mode
20 months ago
LChart 3.9k

I have used condtion here as colnames which represent the sampleID or GTEX-1117F.

This is probably the core problem problem; the condition should have fewer levels than the number of samples. In a pinch you can use ~ 1 as your design; but for GTEx I would strongly recommend including all of the covariates that are potential confounders (age, sex, ischemic time, RIN; see https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1323-z ; figure 5)

ADD COMMENT
0
Entering edit mode

Hello, I do include covariates later. At first I just want to normalize the data and then I generate covariates file using peer tool and then regress it out from normalized gene count to generate residual expression

ADD REPLY
0
Entering edit mode

So i should remove sample_info line and then just use this?

ds<-DESeqDataSetFromMatrix(countData=GTEx_Analysis_gene_reads_cortex, colData=sample_info, design= ~)
ADD REPLY
0
Entering edit mode

That would give you library-normalized counts; yes.

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6