Question

t test PROBLEM

0

Entering edit mode

8.8 years ago

yasjas ▴ 70

Hi guys,

I have a problem when applying a t-test on my dataframe. I am doing this:

class1 <- data[,c(3,5)]

# taking the column of healthy1 vs cancer1
class2 <- data[,c(3,7)]
pvalues<-stack(mapply(function(x, y) t.test(x,y)$p.value, class1[-1], class2[-1]))

on this dataframe (data)

rep_name       rep_family    gene     distance   Healthy1               Healthy2              cancer1   cancer2
 HERVK22-int        LTR   THSD7A        0         5.3682                   4.7400                 4.5634   4.0869
 HERVL40-int        LTR    ANKIB1     3238         7.2268               7.3056                    7.2132   7.5750
 HERVL40-int        LTR    ANKIB1     2879         7.2268                7.3056                     7.2132   7.5750

However it gives me only

values            ind
1 0.6021281 Healthy1

and what I wanted to do is counting the p value for each gene in my dataframe(data)

Any idea on how I can change the code to give the desired result?

Thanks

R • 1.7k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.8 years ago by yasjas ▴ 70

0

Entering edit mode

The class1 and class2 variables are unnecessary in your code (use data$Health1 and data$cancer2)

If I understand, you are comparing healthy1 vs cancer 1 and for each line computing a t-test. With no replicates for each group... The concept of the t-test is that you estimate if the means of the groups differ, taking into account the mean and variance of your groups. You NEED replicates, else you can only compute a log fold change.

There are some very good R packages for differential analysis. Just precise us if cancer1/cancer2 are biological replicates, and tell us what are the units of your Healthy / cancer columns:

Are they from microarray or RNASeq data?
How are they computed from your raw data? Like raw read counts or FPKM for RNASeq for example

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by cyril-cros ▴ 950

0

Entering edit mode

rep_name rep_family    gene distance Hepatocytes_B1 Hepatocytes_B3 Huh.7_B1 Huh.7_B2
14732 HERVK22-int        LTR  THSD7A        0         5.3682         4.7400   4.5634   4.0869
2565  HERVL40-int        LTR  ANKIB1     3238         7.2268         7.3056   7.2132   7.5750
2646  HERVL40-int        LTR  ANKIB1     2879         7.2268         7.3056   7.2132   7.5750
2673  HERVL40-int        LTR  ANKIB1     2355         7.2268         7.3056   7.2132   7.5750
2693  HERVL40-int        LTR  ANKIB1     2051         7.2268         7.3056   7.2132   7.5750
16782 HERVL40-int        LTR PRKAR2B        0         6.4382         2.2347   7.6774   6.6859

that s my data frame and I have Hepatocytes_B1, Hepatocytes_B2 as replicates and same for Huh7_b1, Huh7_b2

what I wanted to see, it s if there is any difference in the expression of the genes comparing healthy (hepatocytes) vs cancer(huh7) so I wanted to do Hepatocytes_b1 vs Huh7_b1 and Hepatocytes_b2 vs Huh7_b2, don't know if it s correct to do like that...

For your question those values come from an microarray

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by yasjas ▴ 70

Ram · Answer 1 · 2015-06-23

In this case, I suggest you use marray for post treatment of your reads (if it is not done yet and you are doing two colours microarray) and limma from the Bioconductor packages for the statistical analysis. I had a really good tutorial http://pastebin.com/wVLD3Amy, you can find the R command lines here. The first part allows you to do some quality analysis, the second is about using limma to find differentially expressed genes. Some data files are missing, sorry, but the visualizations are nice.

http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf is your friend. The idea is that you have way too few replicates to just do t-tests; limma is more adapted in this case. You need to import your data into LIMMA (depends on what microarray you used), and specify your experimental design.

When you are done, you can do some clustering using http://www.tm4.org/mev.html and classify your hits using Gene Ontology (GO) terms.

If you want more detailed help, tell us how you got your numerical values and what microarray you used. Once again, we can help you do t-test but they are not optimal here...