salmon TPM output to linear regression in R
0
0
Entering edit mode
4.0 years ago
evelyn ▴ 230

I want to find the candidate genes for 20 different chemical compounds. I am using TPM data for 50 cultivars and have a matrix showing TPM values for each gene for all 50 cultivars where A1---A6 are genes, A86----A60 are cultivars:

gene       A86         A90       A99   A16          A09         A60
A1          0          0.4        0     0          0          0
A2          0          0          0     0          0          0
A3          0.5        0          0     0.42       0          0
A4          0          0          0     0          0          0
A5          0          0          0     0          0          0
A6          0          0          0     0          0          0

I have chemical compound concentration dataset for each compound like:

Cultivar  Compound_X
A86  20.5
A90  5.6
A99  7.1
A16  12
A09  1.5
A60  9.9

I have TPM values for all cultivars but concentration values are missing for some of the cultivars for different chemical compounds. I want to run standard linear regression approach in R to find what are candidate genes for each chemical compound based on their p values.

for (gene in 1:ngenes){
model = lm(Compound_X~TPM[gene,])
}

I want to extract the p-values from the linear regression and save it to a vector for each gene for each chemical compound to find candidate genes. Thank you!

RNA-Seq salmon lm • 1.0k views
ADD COMMENT
0
Entering edit mode

you can find p-values by using summary function in R : s <- summary(lm(volatile~TPM[gene,])) . p-values are stored in the coefficients component e.g. s$coefficients

ADD REPLY
0
Entering edit mode

Thank you! I am actually not able to run the lm yet. I want to run it using a for loop as I mentioned in the question. I have two datasets mentioned and I want to perform the lm step. Your suggestion will be helpful after that.

ADD REPLY
0
Entering edit mode

Is there a way to drop the genes that have zero TPM for all cultivars?

ADD REPLY
0
Entering edit mode

Plase also note that raw TPM values are not normal distributed so you should not use lm directly on them. Log2 transform them first (remember a pseudo count of your choice).

ADD REPLY

Login before adding your answer.

Traffic: 2660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6