Hi I am looking to align some human short read WGS data with BWA-MEM prior to variant calling. I have 62 samples, each has 8 pairs of fastqs (992 fastqs total). I ran fastqc/multiqc on all of them and got this result from multiqc:
For comparison here is an individual fastqc:
Spot checking a handful of individual fastqc reports it seems the flat universal adapter line hovering around 2ish percent and that poly a line creeping up over the length of the read is pretty typical, does this require trimming prior to alignment?
According to this the threshold for warning is 5% and failure is 10%, but just wondering what the general practice is
Thanks!
I would trim it. Two percent is not much but with WGS, say you have 400mio reads per sample or so, it's still millions of reads that theoretically could contaminate variant calls. Do it once, properly, and never care about it again. That having said, we are utterly flattered with a free-to-use powerful HPC at our university, so all we do is waiting for this job to complete. If you're on a bidget, say need to pay for HPC or a cloud service, you might want to skip it, but I would still feel safer trimming.