Using repeat region filter post QC
1
0
Entering edit mode
5.4 years ago
zx8754 11k

We have NGS whole exome data for 2000 cases and 2000 controls study. Sequencing and QC is done by third party, in theory we already have a "clean" data.

Planning to do gene based tests (skat, burden, etc, or any other analysis), would you advise additionally remove any (all?) of below repeat regions (files are available at UCSC goldenpath):

  • simpleRepeats.txt.gz
  • rmsk.txt.gz
  • genomicSuperDups.txt.gz

We are settled to use simpleRepeats, but no real rationale why. Is it common practice to remove repeat regions, if yes which ones?

We were also suggested to use "blacklists", which ones to use?

ngs exome filter repeats • 1.2k views
ADD COMMENT
2
Entering edit mode
5.4 years ago

in the VCF I don't remove anything but I flag the variants in the column FILTER.

I use the following resources to flag the variants:

http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/ ( mappability, ConsensusExcludable...)

the BED provided in this Heng Li's paper: https://doi.org/10.1093/bioinformatics/btu356

the GATK annotation HomopolymerRun

there is also: https://sites.google.com/site/anshulkundaje/projects/blacklists ( Blacklisted genomic regions for functional genomics analysis )

ADD COMMENT

Login before adding your answer.

Traffic: 1516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6