How to group gene dataset and run t.test for each row for every gene in R?
1
0
Entering edit mode
23 months ago
ishackm ▴ 100

Hi all,

I have the following gene dataset:

enter image description here

I would like to group the samples into the following please:

Malignant cell samples are: AOCS1, G33, G164

Fibroblasts are: G342, G351, G369

After I grouped them into the two sample categories, Malignant and Fibroblast, I would like to do the test test for each row of the genes,

For example,

enter image description here

I am using R studio

I am new to this kind of analysis so any help will be greatly appreciated.

Many Thanks,

Ishack

RNA-Seq r • 2.9k views
ADD COMMENT
0
Entering edit mode

If you are getting this data from raw RNA-Seq data your best bet is to use a well established method like DESeq2 or edgeR or limma.

ADD REPLY
0
Entering edit mode

Even if you do not have the raw data it is a much better solution to use limma via its trend functionality! Those values you post are mostly likely not normal distributed so you should NOT use a t-test!

ADD REPLY
0
Entering edit mode

Hi, I applied this line pValues <- apply(df, 1, function(x) t.test(x[2:4],x[5:7])$p.value)

But I got the following error Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : missing value where TRUE/FALSE needed

Can someone shed some light on this? Many Thanks Chris

Here is my my data https://drive.google.com/open?id=1LiJD7T6oR5MtABwYqkhUrJFfo7XRxJ_z

ADD REPLY
1
Entering edit mode
23 months ago
shawn.w.foley ★ 1.2k

For a problem like this you want to look into the apply function in R. This function will let you perform a function row-wise or column-wise on a dataframe or matrix.

From the help menu: apply(X, MARGIN, FUN, ...) where X is your dataframe/matrix, MARGIN is either 1 for row-wise or 2 for column-wise, and FUN is the function that you want to perform. Depending on what you want to do, you can have a base FUN such as median or sum, or you can define your own function(x), where x is each row (or column) in your dataframe.

So for the example of dataframe df where columns 2-4 are Malignant and 5-7 are fibroblast you can run:

pValues <- apply(df, 1, function(x) t.test(x[2:4],x[5:7])$p.value)

This will take df and for each row (indicated by the 1, as opposed to each column) it will perform function(x), whereby a t-test is performed on the elements 2-4 compared to 5-7, and the p-value is reported (hence the $p.value). This will perform that function for each row, and store the p-values in the vector pValues.

ADD COMMENT
1
Entering edit mode

Thanks very much for your help Shawn, very much appreciated for the clear explanation

ADD REPLY
0
Entering edit mode

Hi, I applied this line pValues <- apply(df, 1, function(x) t.test(x[2:4],x[5:7])$p.value)

But I got the following error Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : missing value where TRUE/FALSE needed

Can you shed some light on this? Many Thanks Chris

Here is my my data https://drive.google.com/open?id=1LiJD7T6oR5MtABwYqkhUrJFfo7XRxJ_z

ADD REPLY
0
Entering edit mode

Hi Chris, can you try the following please?

pValues <- apply(df, 1, function(x) t.test(x[1:3],x[4:6])$p.value)

I think you are refering to the wrong column numbers. Your data has 6 columns, 1-3 columns are group 1 and 4-6 columns are group 2.

ADD REPLY

Login before adding your answer.

Traffic: 1808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6