Question: Dealing with abnormally high coverage regions
0
gravatar for novice
2.5 years ago by
novice910
United States
novice910 wrote:

Hello,

I'm seeking methodology advice on a little project I'm working on. I've identified structural variations over an entire genome and found few spots especially rich in variations. I then checked the read coverage over these spots and found a few of them to have an abnormally high coverage (~3X average). I'm not sure how to interpret this correlation. Does this mean the high number of variations in these regions is artificial? How can I test this further? Could I disregard variations in these high-coverage regions for my later analyses?

[Edit] Additional Information:

  • Working with S. cerevisiae
  • Data is WGS, paired-end
  • Used PEM + SP to detect variations
  • Verified variations with de-novo assembly
coverage alignment • 844 views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by novice910

What kind of data do you have? WGS? exome? RNAseq?

ADD REPLYlink written 2.5 years ago by Floris Brenk890

Whole Genome Sequencing

ADD REPLYlink written 2.5 years ago by novice910

Try to be complete in your initial posts since something like this is a very important piece of information.

ADD REPLYlink written 2.5 years ago by WouterDeCoster38k

How did you identify those spots?

ADD REPLYlink written 2.5 years ago by WouterDeCoster38k

Paired-end mapping, split-read mapping, and refinement with de-novo assembly

ADD REPLYlink written 2.5 years ago by novice910

Did you check for low mappability or presence of segmental duplications? Was mapping quality taken into account?

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by WouterDeCoster38k

Did you check for low mappability or presence of segmental duplications?

No.

Was mapping quality taken into account?

Yes, I used a minimum mapping quality of 20.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by novice910

If you happen to be working with human/mouse: do these sites happen to overlap blacklisted regions?

ADD REPLYlink written 2.5 years ago by Devon Ryan89k

I'm working with yeast (S. Cerevisiae). I don't think there's a blacklist available for this species. Sorry for the lack of clarity in the original post!

ADD REPLYlink written 2.5 years ago by novice910

Blacklisted regions?

ADD REPLYlink written 2.5 years ago by pld4.8k

I never heard of this concept before but searching for it I found a definition

https://sites.google.com/site/anshulkundaje/projects/blacklists

it says

artifact regions that tend to show artificially high signal

what seems to be lacking is a reasoning of why that be so. I find it a bit excessive to flat out just remove whole regions based on "blaclists". Do people actually do this? Surprised that's all.

ADD REPLYlink written 2.5 years ago by Istvan Albert ♦♦ 80k

It's standard in ChIPseq, but pretty much no where else (it's normally not useful elsewhere). Some of these seem to be rRNAs or other similar "improperly assembled" regions, which I imagine could cause issues in OPs use-case.

ADD REPLYlink written 2.5 years ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 928 users visited in the last hour