Background Information:
I have 53 genes of interest. I have two conditions Tumour and Normal samples at 24, 48 and 72 hours from three mice (i.e. 3xTumour and 3x Normal for each timepoint). The Tumours and Normal skin are sampled repeatedly (i.e. each mouse has Tumor and Normal samples taken at each time point). I have measured the expression of all my genes of interest (53) in each sample using qRT-PCR.
If these data were RNA-seq or microarray I would turn to something like DEseq2 to perform a differential expression analysis. If I only had up to 10 genes of interest I would have no concern to use a repeated measures ANOVA or linear models to investigate differential expression between my two conditions.
However I feel like I am somewhere in between each method, as correcting for multiple testing for 53 ANOVAs is diminishing my Pvalues.
I am aware that N=3 is a very small sample, but this is the nature of biological sciences (especially during a PhD). With this in mind humour me in your answer to my questions.
Questions:
After how many measurements does one decide that they are performing enough tests to warrant a software package such as DEseq2 (i.e. a microarray/RNA-seq approach) to investigate differential expression and how many tests are not enough?
Are there tests or packages that better deal with low to moderate numbers of statistical tests?
In the case that I am not raising a valid concern (quite possible), what would be the most powerful approach to discerning any difference in gene expression that exists between tumour and normal tissue, at any point in time, in my study?
Thanks in advance
-Lucas
As a remark: the assumption in tools such as DESeq2 and edgeR is that most genes in the dataset are not differentially expressed. I don't think that is the case for your selection? Did you quantify housekeeping genes using qRT-PCR?
I did quantify housekeeping genes, however these are not shown in my selection, but I use them to create these graphs. I have plotted the DELTA Ct of my genes of interest calculated by subtracting the average Ct of three housekeeping genes at each time point for each condition, converting to fold change and then normalising to the 0H point. To comment on your remark regarding the presence of many changes in my genes of interest, this difference may not be as prominent as it seems upon first glance. I have allowed a free y scale to exemplify the patterns in the genes individually (patterns I would like to see). However these patterns diminish when the Y scale is kept constant. I will try to test for differential expression with DEseq2. As it will consider all the genes at once perhaps the changes that seem apparent here will not have a detrimental effect on the assumption of consistency for the majority of genes. I have attached the same genes graphed with a consistent Y scale.
Did you try something like bioconductor HTqPCR? I never used it myself, but if I had data like yours I would probably try to find something in bioconductor first.
No I haven't yet. Now it's installed and I'll give it a go. Thanks for the suggestion.
So I tried HTqPCR, but found that I was unsure of what was happening behind the scenes, so I resorted to limma - which I understand was implemented in HTqPCR anyway.
I have come to a point of confusion. There is an example of applying limma to differentiate between the expression of 3 replicates forming two groups (dummy data) given by Gordon Smyth in the limma package vignette (bottom of page). In this example there is a comment "Would like to consider original two estimates plus difference between first 3 and last 3 arrays" that I do not understand. This comment and following method of constructing a contrast matrix seems at odds with another BiostarUser for investigating the gene DE between groups that do not "consider original two estimates" and examples I have followed in the limma user guide (section 9.2). I'm sure this comes down to my lack of understanding when it comes to contrast matrices. However I have been unable to resolve this myself.
Help with understanding the difference between the two methods is greatly appreciated.
Smyth method
vs.
Biostar user method
I don't exactly understand your question, but I think you're better off asking Gordon Smyth himself for help. He's active on https://support.bioconductor.org/ and usually answers questions about limma within a day. Be sure to add
limma
as a tag.