Question: Should zero expression values be remove in eQTL analysis?
2.1 years ago
For example, RNA-seq expression for gene1 in 10 people are, GENE1=[0, 0, 1, 2, 3, 4, 3, 4, 2, 7]。 SNP1 with alleles A and G, and SNP1 in 10 people is SNP1=[0, 1, 2, 1, 1, 2, 2, 2, 0, 0], 0 means GG, 1 means AG, 2 means AA。

What I want to do it eQTL analysis. Simple put, I want to fit a linear model to find out if the expression GENE1 was regulated by SNP1。 Should I remove the zeros values in GENE1 expression values before fit the regression model? It should be noted, for many genes, if I removed the zeros, most of the samples will also be removed.

2.1 years ago
I would only remove genes that have 0 levels of expression in a very large proportion of samples.

More generally, you might want to filter genes with low variance across samples (see e.g. this paper, in the eQTL mapping paragraph of Materials and Methods section), since they are not informative for the analysis.

