Question: How to group gene dataset and run t.test for each row for every gene in R?
0
gravatar for ishackm
18 months ago by
ishackm100
ishackm100 wrote:

Hi all,

I have the following gene dataset:

enter image description here

I would like to group the samples into the following please:

Malignant cell samples are: AOCS1, G33, G164

Fibroblasts are: G342, G351, G369

After I grouped them into the two sample categories, Malignant and Fibroblast, I would like to do the test test for each row of the genes,

For example,

enter image description here

I am using R studio

I am new to this kind of analysis so any help will be greatly appreciated.

Many Thanks,

Ishack

rna-seq R • 1.9k views
ADD COMMENTlink modified 7 months ago by mhxprs0 • written 18 months ago by ishackm100

If you are getting this data from raw RNA-Seq data your best bet is to use a well established method like DESeq2 or edgeR or limma.

ADD REPLYlink written 18 months ago by benformatics2.0k

Even if you do not have the raw data it is a much better solution to use limma via its trend functionality! Those values you post are mostly likely not normal distributed so you should NOT use a t-test!

ADD REPLYlink written 18 months ago by kristoffer.vittingseerup3.4k

Hi, I applied this line pValues <- apply(df, 1, function(x) t.test(x[2:4],x[5:7])$p.value)

But I got the following error Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : missing value where TRUE/FALSE needed

Can someone shed some light on this? Many Thanks Chris

Here is my my data https://drive.google.com/open?id=1LiJD7T6oR5MtABwYqkhUrJFfo7XRxJ_z

ADD REPLYlink written 7 months ago by mhxprs0
1
gravatar for shawn.w.foley
18 months ago by
shawn.w.foley1.2k
USA
shawn.w.foley1.2k wrote:

For a problem like this you want to look into the apply function in R. This function will let you perform a function row-wise or column-wise on a dataframe or matrix.

From the help menu: apply(X, MARGIN, FUN, ...) where X is your dataframe/matrix, MARGIN is either 1 for row-wise or 2 for column-wise, and FUN is the function that you want to perform. Depending on what you want to do, you can have a base FUN such as median or sum, or you can define your own function(x), where x is each row (or column) in your dataframe.

So for the example of dataframe df where columns 2-4 are Malignant and 5-7 are fibroblast you can run:

pValues <- apply(df, 1, function(x) t.test(x[2:4],x[5:7])$p.value)

This will take df and for each row (indicated by the 1, as opposed to each column) it will perform function(x), whereby a t-test is performed on the elements 2-4 compared to 5-7, and the p-value is reported (hence the $p.value). This will perform that function for each row, and store the p-values in the vector pValues.

ADD COMMENTlink written 18 months ago by shawn.w.foley1.2k
1

Thanks very much for your help Shawn, very much appreciated for the clear explanation

ADD REPLYlink written 18 months ago by ishackm100

Hi, I applied this line pValues <- apply(df, 1, function(x) t.test(x[2:4],x[5:7])$p.value)

But I got the following error Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : missing value where TRUE/FALSE needed

Can you shed some light on this? Many Thanks Chris

Here is my my data https://drive.google.com/open?id=1LiJD7T6oR5MtABwYqkhUrJFfo7XRxJ_z

ADD REPLYlink written 7 months ago by mhxprs0

Hi Chris, can you try the following please?

pValues <- apply(df, 1, function(x) t.test(x[1:3],x[4:6])$p.value)

I think you are refering to the wrong column numbers. Your data has 6 columns, 1-3 columns are group 1 and 4-6 columns are group 2.

ADD REPLYlink modified 7 months ago • written 7 months ago by ishackm100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1667 users visited in the last hour