So for context, I have a set of TPM values (which I converted to log2(TPM+1) for multiple genes for different samples, and I need to calculate the differential expression for RNA-seq values. I've been using this website as a guide: https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2019_March_UCSF_mRNAseq_Workshop/master/differential_expression/DE_Analysis.Rmd. The overall aim of this calculation is so that I can use PANDA to map TM motifs/combine gene expression data to create networks representing interactions between transcription factors and genes, and GSEA to analyze genes ranked by fold change or differential expression p-value.
So what I need to do is match one male to one female of the same age group (which I already did prior; the number of males outnumbered the number of females so I filtered the number of males so that there would be an equal number, and of the same age distribution). But it seems that I need to derive two factors and create a new variable "group" that combines factor1 and factor2. In this case, it would be the gender I would use as the group factors, right?