I have the downstream plan of performing differential expression analyses (using rDiff and Deseq2) on a data set containing three treatments which each have three replicates.
Two of the treatments aligned very well (>87% aligned unambiguously in STAR). One treatment, however, has a contaminant (which I know he Identity of) which makes up a majority of reads for that treatment. Only between 11% and 58% of the reads from the contaminated treatment align to my species of interest.
My gut says just to drop the contaminated treatment (which I can do and still have something to work with) because I have fewer reads from that treatment going in. Reading about how programs like DeSeq2 work, however, makes me think that maybe I could still include the contaminated treatment in the case of differential expression.
Can any of you kind people provide any advice on best practices/what is "okay" here?