Question: How to group gene dataset and run t.test for each row for every gene in R?
gravatar for ishackm
27 days ago by
ishackm60 wrote:

Hi all,

I have the following gene dataset:

enter image description here

I would like to group the samples into the following please:

Malignant cell samples are: AOCS1, G33, G164

Fibroblasts are: G342, G351, G369

After I grouped them into the two sample categories, Malignant and Fibroblast, I would like to do the test test for each row of the genes,

For example,

enter image description here

I am using R studio

I am new to this kind of analysis so any help will be greatly appreciated.

Many Thanks,


rna-seq R • 115 views
ADD COMMENTlink modified 27 days ago by shawn.w.foley720 • written 27 days ago by ishackm60

If you are getting this data from raw RNA-Seq data your best bet is to use a well established method like DESeq2 or edgeR or limma.

ADD REPLYlink written 27 days ago by benformatics870

Even if you do not have the raw data it is a much better solution to use limma via its trend functionality! Those values you post are mostly likely not normal distributed so you should NOT use a t-test!

ADD REPLYlink written 21 days ago by kristoffer.vittingseerup1.8k
gravatar for shawn.w.foley
27 days ago by
shawn.w.foley720 wrote:

For a problem like this you want to look into the apply function in R. This function will let you perform a function row-wise or column-wise on a dataframe or matrix.

From the help menu: apply(X, MARGIN, FUN, ...) where X is your dataframe/matrix, MARGIN is either 1 for row-wise or 2 for column-wise, and FUN is the function that you want to perform. Depending on what you want to do, you can have a base FUN such as median or sum, or you can define your own function(x), where x is each row (or column) in your dataframe.

So for the example of dataframe df where columns 2-4 are Malignant and 5-7 are fibroblast you can run:

pValues <- apply(df, 1, function(x) t.test(x[2:4],x[5:7])$p.value)

This will take df and for each row (indicated by the 1, as opposed to each column) it will perform function(x), whereby a t-test is performed on the elements 2-4 compared to 5-7, and the p-value is reported (hence the $p.value). This will perform that function for each row, and store the p-values in the vector pValues.

ADD COMMENTlink written 27 days ago by shawn.w.foley720

Thanks very much for your help Shawn, very much appreciated for the clear explanation

ADD REPLYlink written 27 days ago by ishackm60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1046 users visited in the last hour