I grouped the samples into p53 wildtype and p53 mutated - for approximately 1000 individuals. I have gene expression data (logFC) of each individual present in both mutated and non-mutated groups. Now my aim is to identify the genes that are strongly upregulated under p53 mutation. I want to analyze the link between the mutation and the expression of the gene:
I am wondering what are the appropriate statistical tests for analyzing such relationship?
Should I perform the grouped analysis (all p53 wildtype vs all p53 mutated), or pairwise analysis (single mutated case vs single non-mutated case), then taking the average of the significant value of each pair for finding the associated genes?
Please note that my mutation data is in binary format (-1: mutation and 0: wildtype) and gene expression data as log FC. The row represents the gene name and columns represents the each sample data.
Any advice or pointers would be greatly appreciated.
Thanks in advance.