I used to work with the human genome but it's the first time I'm working with musmusculus mm10 sequenced on Novaseq.
- Fastqc / overrepresented sequence reported that 0.1% of my reads are a sequence which looks lile a GSAT_MM (microsat) repeat. Is it a known fact for mm10 or is there anything (wet lab | bioinformatics) that could explain this number ? I've got many poly-G in one sample too ...
- After mapping with bwa + sambamba rmdup, I got an average depth of ~20 but it falls down to 10 for the median depth. I think that's because the region of GSAT_MM is grabbing many reads (?)
So again, is there any known problem like this when doing hts with mus musculus ?