is there any statistical way to analysis two groups without replications?
2
0
Entering edit mode
2.3 years ago
star ▴ 270

I have a data frame with several rows as gene names and 3 columns with gen-expression value for each individual. Unfortunately, I do not have any replicate from individual (I only have one value for each one).

I would like to know is there any statistical method for comparing , whether gene expression value of Child is difference than parents for each gene (row )or not.

I have tested Wilcox test but I am not sure about it!

  ensembl_gene_id           mother       father        child
1 ENSG00000000003        1.8066034    0.4437994    0.4883299
2 ENSG00000000419       16.5474402   17.4191259    20.8935447
3 ENSG00000000457       27.8583486   17.9184003    35.2643968

statistics geneexpression • 982 views
0
Entering edit mode

Do you want to do this on a gene-by-gene basis - i.e. you'd like to know which genes are different in the child than in the parents? Or just if the child, in general, in different from the parents "averaging" across all genes? Also do you want to do child vs. "parents", or child vs mother and child vs father.

0
Entering edit mode

Thanks @ sudbery. i would like to do gene by gene and find which genes are different in child than parents. Also I like to do both comparing, like child vs parents, andchild vs mother and child vs father.

0
Entering edit mode

what sort of data is this? Microarray? TPMs/FPKMs from an RNA-seq experiment?

0
Entering edit mode

It is CPM of RNA-seq expriment.

2
Entering edit mode
2.3 years ago
Benn 8.1k

You can use edgeR for this, but you need raw read count data. Also be aware that what edgeR offers you (see edgeR manual about n=1 experiments), is a trick, so it is not sound statistics. You need more replicates for sound statistics.

2
Entering edit mode
2.3 years ago

Without replicates it is pretty much meaningless to do differential analysis as it is impossible to find the variability for each gene. In a child-vs-parent comparison there is some ability to measure variance, because you have both mother and father, and conceptually you can measure the mother-father variance and then see where the child lies in relation to that. Both edgeR and voom (from raw read counts) will allow you to do this sort of experiment. Limma-trend will allow you to input CPMs I think, but you are another step removed from ideal then.

Be aware however of the following:

• This is making the assumption that expression variation is the same in the child as it is in the parents. This may not be true
• Dispersion estimates made using only 2 samples are going to be poor. This will lead to both poor power and poor accuracy. In theory this should be accounted for in the FDR value, but in practice this is likely not the case.
• Low power means more false positives.

Thus this analysis may give you some signal amongst the noise, and may even be enough to give you enrichments for downstream analyses, but I would not trust the result for any particular gene without further evidence.

An alternative approach might be to calculate the variance for each gene, and then de-trend it (effectively calculate the dispersion). If you then assume that the dispersions should be normally distributed, you could calculate a Z score for each gene compared to the mean and variance of all genes, thus identifying genes that are outliers.

This would have the advantage that you would find genes where any of the individuals could be the outlier (thus cases where child is more like mother or father as well as cases where the child is the outlier). It has the disadvantage that it would also identify cases where expression of a given gene is just naturally more variable than others.