Question

How to group gene dataset and run t.test for each row for every gene in R?

0

Entering edit mode

4.9 years ago

ishackm ▴ 110

Hi all,

I have the following gene dataset:

enter image description here

I would like to group the samples into the following please:

Malignant cell samples are: AOCS1, G33, G164

Fibroblasts are: G342, G351, G369

After I grouped them into the two sample categories, Malignant and Fibroblast, I would like to do the test test for each row of the genes,

For example,

enter image description here

I am using R studio

I am new to this kind of analysis so any help will be greatly appreciated.

Many Thanks,

Ishack

RNA-Seq r • 7.2k views

ADD COMMENT • link updated 4.0 years ago by mhxprs • 0 • written 4.9 years ago by ishackm ▴ 110

0

Entering edit mode

If you are getting this data from raw RNA-Seq data your best bet is to use a well established method like DESeq2 or edgeR or limma.

ADD REPLY • link 4.9 years ago by benformatics 3.9k

0

Entering edit mode

Even if you do not have the raw data it is a much better solution to use limma via its trend functionality! Those values you post are mostly likely not normal distributed so you should NOT use a t-test!

ADD REPLY • link 4.9 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

Hi, I applied this line pValues <- apply(df, 1, function(x) t.test(x[2:4],x[5:7])$p.value)

But I got the following error Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : missing value where TRUE/FALSE needed

Can someone shed some light on this? Many Thanks Chris

Here is my my data https://drive.google.com/open?id=1LiJD7T6oR5MtABwYqkhUrJFfo7XRxJ_z

ADD REPLY • link 4.0 years ago by mhxprs • 0

score 1 · Answer 1 · 2019-05-22

1

Entering edit mode

4.9 years ago

shawn.w.foley ★ 1.3k

For a problem like this you want to look into the apply function in R. This function will let you perform a function row-wise or column-wise on a dataframe or matrix.

From the help menu: apply(X, MARGIN, FUN, ...) where X is your dataframe/matrix, MARGIN is either 1 for row-wise or 2 for column-wise, and FUN is the function that you want to perform. Depending on what you want to do, you can have a base FUN such as median or sum, or you can define your own function(x), where x is each row (or column) in your dataframe.

So for the example of dataframe df where columns 2-4 are Malignant and 5-7 are fibroblast you can run:

pValues <- apply(df, 1, function(x) t.test(x[2:4],x[5:7])$p.value)

This will take df and for each row (indicated by the 1, as opposed to each column) it will perform function(x), whereby a t-test is performed on the elements 2-4 compared to 5-7, and the p-value is reported (hence the $p.value). This will perform that function for each row, and store the p-values in the vector pValues.

ADD COMMENT • link 4.9 years ago by shawn.w.foley ★ 1.3k

1

Entering edit mode

Thanks very much for your help Shawn, very much appreciated for the clear explanation

ADD REPLY • link 4.9 years ago by ishackm ▴ 110

0

Entering edit mode

Hi, I applied this line pValues <- apply(df, 1, function(x) t.test(x[2:4],x[5:7])$p.value)

But I got the following error Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : missing value where TRUE/FALSE needed

Can you shed some light on this? Many Thanks Chris

Here is my my data https://drive.google.com/open?id=1LiJD7T6oR5MtABwYqkhUrJFfo7XRxJ_z

ADD REPLY • link 4.0 years ago by mhxprs • 0

0

Entering edit mode

Hi Chris, can you try the following please?

pValues <- apply(df, 1, function(x) t.test(x[1:3],x[4:6])$p.value)

I think you are refering to the wrong column numbers. Your data has 6 columns, 1-3 columns are group 1 and 4-6 columns are group 2.

ADD REPLY • link 4.0 years ago by ishackm ▴ 110