Question: 1000Genome Project To Cover Gc-Rich Region?
0
gravatar for michealsmith
8.0 years ago by
michealsmith750
michealsmith750 wrote:

I'm looking for rare variants from whole-genome sequencing data. I found a "rare" SNP in my patient sample which has never been found in any database including latest 1000-Genome and exome sequencing database. However when I check this in other 4 randomly-chosen control whole-genome sequences from 1000G, it turned out within GC-rich region and barely covered by any reads (but in my data, sequencer goes through this GC-rich region resulting good coverage).

Then I would argue I'm not sure if the SNP I found is really rare, or just common one but missed by NGS in 1000G because PCR simply cannot go over the GC-rich region.

But 1000G got huge number of samples and call SNP/indel from this aggregation of samples simultaneously; it'll be almost impossible that one certain region won't be covered by any read, right?

So should I trust 1000Genome SNP/indel database for those GC-rich region?

• 2.0k views
ADD COMMENTlink modified 8.0 years ago by JC12k • written 8.0 years ago by michealsmith750
2

Due to various filtering, 1000g will miss a small fraction of common SNPs, which can hardly be avoided. Checking unfiltered SNPs is a better way to confirm if it is really rare. I do not know if unfiltered are still available.

Don't trust indels. 1000g still have a lot of troubles with them. They are trying hard to improve indel calling.

ADD REPLYlink written 8.0 years ago by lh332k

what is the sequencer you used for your data ?

ADD REPLYlink written 8.0 years ago by Raony Guimarães1.1k

The sequencer is HiSeq2000

ADD REPLYlink written 8.0 years ago by michealsmith750
1
gravatar for JC
8.0 years ago by
JC12k
Mexico
JC12k wrote:

All sequencing technologies have problems in high GC content regions, so calling variants there is hard. I don't know how are you verifying your variants, but the 1000G VCF reports the total reads used to call a variant (DP=N), so you can filter by threshold. Also, you can check (if your variants are exoninc) in the ESP6500 http://evs.gs.washington.edu/EVS/

The simple way to integrate various sources is with Annovar http://www.openbioinformatics.org/annovar/

ADD COMMENTlink written 8.0 years ago by JC12k
1

THanks. I'm using annovar; and I'm right now using both 1000G and ESP6500 for filtering with MAF cutoff 0.01; I would agree with lh3 that we should use unfiltered version because I came across many well-studied common SNP absent from 1000G dabatase probably due to various types of filtering

ADD REPLYlink written 8.0 years ago by michealsmith750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 919 users visited in the last hour