Hello everyone, I have a huge dataset with a bunch of human samples to analyse. Of course, I run into troubles because the samples come from different donors and when I PCA those samples, well...it's a bit dodgy. They cluster according to their condition but I am not sure about how am I supposed to deal with this batch effect? A few time ago, I used SVA package but I wasn't happy with that.
A problem related to this is probably due to the fact that my samples are not trimmed appropriately. I have a lot of problem with the facility that generated these fastq files because sometimes they provide me trimmed samples, sometimes they don't (given the fact that this whole dataset comes from different batches/years). Thus, my questions:
- Don't you think that all of my samples, to generate useful data, must have been processed in the same identical way (e.g. same Sliding window, leading, trailing, minlen)? I am quite confused about this.
- What if, by any chance, I trim an already-trimmed file?
- When I am trying to trim my samples, I don't manage to remove adapter contamination..according to my beloved multiqc report there's a huge nextera transposase sequence contamination that Trimmomatic can't remove, even when selecting specific adapters...