Question

t-test to check if the cancer samples are significantly different than the normal samples using R

0

Entering edit mode

11 months ago

Amr ▴ 160

Mean_conditions <- rowMeans(count_data_normalized2[,c('Condition1', 'Condition2','Condition3')], na.rm=TRUE)

Mean_Treatment <- rowMeans(count_data_normalized2[,c('Treatment1', 'Treatment2','Treatment3')], na.rm=TRUE)

df2<-as.data.frame(Mean_conditions)

df1<- as.data.frame(Mean_Treatment)

t.test(df1, df2, var.equal = TRUE)

Welch Two Sample t-test

data: Mean_Treatment and Mean_conditions t = 0.35161, df = 24478, p-value = 0.7251 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -162.0191 232.8535 sample estimates: mean of x mean of y 1244.836 1209.419

Is the t-test is implemented correctly? Does that mean that there is no significantly differences between the samples normal and cancer?

t-test independent • 1.0k views

ADD COMMENT • link 11 months ago by Amr ▴ 160

score 1 · Answer 1 · 2023-05-06

1

Entering edit mode

11 months ago

LChart 3.9k

The degrees of freedom here are 24,478. Do you really have >20,000 samples? I suspect that each "row" corresponds to a distinct biomarker/assay. The fact that your data is called count_data_normalized2 makes me suspect that this is RNA expression data; in which case do not use a T-test. Just put it into DESeq2.

ADD COMMENT • link 11 months ago by LChart 3.9k

0

Entering edit mode

I have >20,000 genes (rows) and 6 columns (3 cancer replicates and 3 normal replicates). Is it a good idea to perform t-test to know if there is a significant differences between the samples normal and cancer?

ADD REPLY • link 11 months ago by Amr ▴ 160

1

Entering edit mode

No. It's not a good idea to use t-test for RNASeq. Use DESeq2 or EdgeR.

ADD REPLY • link 11 months ago by swbarnes2 14k

0

Entering edit mode

and how to know if the samples are significantly different? by which test or plot?

ADD REPLY • link 11 months ago by Amr ▴ 160

2

Entering edit mode

There is no bullet-proof metric to say with certainty that something is "different". You can combine several diagnostics to build up your narrative though. Typically you would start with some sort of dimensionality reduction such as PCA to see whether your samples separate in reduced dimensional space or whether there are any sorts of confounders that need to be adjusted for. Confounders can be batch effects, or something anything associated wth individuals such as dietary status, age, disease prevalence, environmental exposure, drug consumptions and combinations of those -- given that you have any additional metadata for your cohort. You can then test for differential expression (adjusting for unwanted variation if necessary, see also SVAseq and RUVSeq), checking how many DEGs you get. Do heatmaps and clustering to infer patterns between DEGs and sample groups. Then see whether DEGs or signatures you get from clustering are enriched for terms (REACTOME, KEGG, KO...). Many other sorts of analysis possible, but start with these basics. You build your narrative on this.

How is PCA separation (if any)
How many DEGs
Are there patterns in the DEG profile
Are DEGs enriched for processes
Which individual DEGs are present, even in absence of term enrichment. If you for example have many rate limiting enzymes to be differential then it is likely that the pathway itself is also differential. Or if many transcription factors are DEG then you can check whether these TFs have previously been linked to any phenotype in the celltype you're working with.
Are there published disease signatures to compare with

Put that all together and interpret it biologically. See what literature already knows on your disease entity, then integrate that with the things you learned from above analysis.

There is not going to be that one single test that, if p<0.05, says "yes, it's different, here is my Nature paper". That's just not how biology and science works.

For starters on the technical parts: Basic normalization, batch correction and visualization of RNA-seq data and many other tutorials here and online. Please go through it, that's the basics of analysis with high-throughput data. Don't skip literature research, that's essential to separate from your results what is known already, novel, and likely just noise.