Question: gTEX normalize counts by genotype
gravatar for Bioninbo
3.1 years ago by
Bioninbo0 wrote:


I want to analyze the gTEX dataset, but I am not interested in the impact of genotypes on phenotypes (RNA-Seq data).
In the lastest release of the gTEX data, more than 170,000 cis eQTL have been detected. Therefore a large amount of variation can be accounted by genetic loci.

I was wondering if anyone considered normalizing transcripts abundance by the genotype of the donors? My aim would be to obtain a matrix of gene counts normalized by genotypes.

Thanks a lot for inputs!

eqtl gtex genotype • 1.2k views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Bioninbo0
gravatar for cindy.perscheid
3.1 years ago by
Hasso Plattner Institute, Potsdam, Germany
cindy.perscheid100 wrote:


could you specify what you mean with "normalized by genotypes" here? As far as I understand, you have gene counts on the other hand, and a genotype (with variants, ergo non-numerical?) on the other hand. What would be your intention to do such a thing?



ADD COMMENTlink written 3.1 years ago by cindy.perscheid100
gravatar for Bioninbo
3.1 years ago by
Bioninbo0 wrote:

Hi Cindy,

Thanks for your answer! Let me clarify myself.

What I suggest is to use the information of eQTL in a given tissue to normalize each sample individually.

For instance, if variant V is an eQTL that increases the expression of gene G. Could we quantify the amount of increase in expression of G for people bearing V (e.g. 50%)? This would be done at the level of all samples (individuals) for a given tissue.

And then use this number to normalize the count matrix, of each separate sample (individual) during or after its creation. For instance, by dividing the total counts of all reads mapped on gene G that bear variant V by 1.5. (Further optimizing one could maybe try to estimate the number of different transcript on each gene and to normalize this number instead). This is probably not the best way to normalize but this way I hope to clarify my thought. And my question is: do anyone know of studies/methods that tried to use such an approach?

Best, Jerome

ADD COMMENTlink written 3.1 years ago by Bioninbo0

What do you plan to do with the normalized data? The notion of "correcting" an analysis for the effect of a specific genotype is not uncommon, and this can be done by adding it as a covariate in your regression model. I'm sure it is technically possible to do this correction / "normalization" upfront on the count matrix, as one may do for batch effect, but I am less sure about this.

ADD REPLYlink written 3.1 years ago by christopher medway450

Hi mbyvcm and thanks for the input!

I checked the thread What is the simple way to remove known batch effect from RNA-seq data ? and I think I understand what you suggest. I want to explore the gTEX data, and in particular the expression of genes/alleles in different tissues and for different types of individuals. I thought that correcting for the effect of eQTL could improve the accuracy of all analysis (even though it means tools like DESeq2 are not available for normalization afterwards, but other tools are better adapted to multi-tissues RNA-Seq normalization such as YARN/qsmooth). This should be done upfront in order to have correct subsequent analysis. Maybe a new tool, similar to ComBat, could be created for that purpose? Because, I guess that would be quite computationally intense, and would need some optimization.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Bioninbo0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2070 users visited in the last hour