I'm seeking methodology advice on a little project I'm working on. I've identified structural variations over an entire genome and found few spots especially rich in variations. I then checked the read coverage over these spots and found a few of them to have an abnormally high coverage (~3X average). I'm not sure how to interpret this correlation. Does this mean the high number of variations in these regions is artificial? How can I test this further? Could I disregard variations in these high-coverage regions for my later analyses?
[Edit] Additional Information:
- Working with S. cerevisiae
- Data is WGS, paired-end
- Used PEM + SP to detect variations
- Verified variations with de-novo assembly