I have a question regarding my experimental design and a possible confounding factor. I am working on a RNA-seq project (honeybee brain mRNA) in which I will have 4 conditions with 10 biological replicates each. I just realized based on other analyses on the same samples that at times of sampling the animals were transitioning across two behavioural states that are known to be associated with substantial changes in brain gene expression (ca 40% of genes are known to be differentially expressed between the two).
So I have about 60% of my samples in one state and the others in the other state. These behavioural groups are represented in each treatment but not in a completely balanced way as I only discovered this additional factor later. I am worried that a factor with such a huge effect on gene expression may hamper the recovery of DEGs from the treatments that I am interested in (which likely will have more mild effects than behavioural state) and I am also wondering if it would be better to have the sampling fully balanced, so 20 samples in one state, 20 in the other, 5 samples per treatment in each state. Or could I still account for behaviour as a covariate in my differential expression analyses equally well even with no perfect balancing of the factor?
Since I have not yet prepared the libraries, would you suggest I go back and change some of the samples so that the behaviours are 50/50 and equally represented across treatments?