Question: Regions to exclude in CNV analysis
0
gravatar for wanabi
3.7 years ago by
wanabi60
Germany
wanabi60 wrote:

Hello,

I have called CNVs from my WGS data and want to do some QC. For this, I want to exclude segments overlapping with more than 50% of their length with the list below, and I have some doubts. Can you please help me?

. Telomeres/centromers
. Immunoglobulin regions
. Extreme GC content (>90%, <10%): Do these threshold make any sense. My CNVs were called with GC option in Control-Freec so I dont know if this is really necessary. Any thoughts?
. Mappabiliy: Should I use uniqness or alignability definitions to do this? In case I chose uniqness, should I just filter out all regions with uniqness < 1? In case I use alignability, which threshold would you use?
. Repeat masker
. Common CNVs: I am currently using the dgv. Is this recommendable?

. Any other list you would recommend me to use to clean my data?


Thanks a lot

cnv exclude filter • 1.2k views
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by wanabi60
1
gravatar for ebrown1955
3.7 years ago by
ebrown1955300
United States
ebrown1955300 wrote:

The first step is to get a list of regions that you want to eliminate (in BED format) and use Bedtools -overlap to compare your CNV list with the lists that you have. You can then filter these out in Excel or R (using which, etc.)

There may be a more elegant way to do this, but this is what I do.

Good luck!

ADD COMMENTlink written 3.7 years ago by ebrown1955300
1
gravatar for wanabi
3.7 years ago by
wanabi60
Germany
wanabi60 wrote:

Thanks for the info.

 

Yes, I am planning to exclude those regions either with plink or bedtools.

My question is more about the lists of regions to exclude, and specially, about the threshold people use for regions with low mappability.

 

Thanks

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by wanabi60
1

The UCSC Genome Bioinformatics Site has two BED files (designed by other teams) that are useful for excluding low-mappability regions in the human genome:

ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/

ADD REPLYlink written 3.7 years ago by Eric T.2.5k
1
gravatar for wanabi
3.7 years ago by
wanabi60
Germany
wanabi60 wrote:

Thanks a lot for the info. Do you know where I can get a similar file for extreme (>90, <10 GC content). Ucsc just provides 5bp tracks which are not very useful to filter out CNV data.

 

Thanks!

ADD COMMENTlink written 3.7 years ago by wanabi60
0
gravatar for ebrown1955
3.7 years ago by
ebrown1955300
United States
ebrown1955300 wrote:

As for DGV, these regions are okay, however it really depends on what you are trying to do! There are some control populations available from dbGap that you can download and use to determine whether a CNV is rare or not, as DGV can and does include regions that are not considered rare. It is also important to note that rare != pathogenic.

ADD COMMENTlink written 3.7 years ago by ebrown1955300
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1977 users visited in the last hour