Question

Missing Snps For Hba1-Hba2 In 1000 Genome Data??

0

Entering edit mode

13.8 years ago

Tarbem ▴ 10

Hey guys,

I wanted to extract the SNPs called for HBA1 and HBA2 genes in 1000 genome project. However, these two genes appear to have no SNPs - no missense or samesense SNPs.

I cross-checked on ensemble (1000 genome browser), and dbSNP reports about 500 non-synonymous SNPs in the exonic region of HBA1.

What are the odds that 629 people in the 1000 genome project happen to have exactly the same coding sequence for HBA1? I guess, very unlikely.

Am I missing something obvious?

ps. Laura, from 1000 genome project, provided some explanation on a similar issue regarding missing genotypes here: http://biostar.stackexchange.com/questions/9550/why-so-many-missing-genotypes-in-1000-genomes-data

But I think the situation in this post is quite different, and I appreciate to hear any idea why the snps are missing.

Thanks

genome snp dbsnp • 3.5k views

ADD COMMENT • link updated 13.3 years ago by lh3 33k • written 13.8 years ago by Tarbem ▴ 10

score 1 · Answer 1 · 2011-09-30

1

Entering edit mode

13.8 years ago

lh3 33k

Because HBA1 and HBA2 are nearly identical in the coding regions. You cannot do much about that with short reads.

ADD COMMENT • link 13.8 years ago by lh3 33k

score 0 · Answer 2 · 2011-09-30

I must say that I first tried to reproduce your error at the 1000 genomes browser searching for HBA1 and I indeed didn't see the expected variation, but then I realized that the track I was looking at corresponded to 20100804 data, which is what Laura described, and not the latest release. in fact, this is the note at the welcome page of the browser:

The 1000 Genomes Browser

Ensembl-based browser provides early access to 1000genomes data

In order to facilitate immediate analysis of the 1000genomes data by the whole scientific community, this browser (based on Ensembl) integrates the SNP calls from the August 2010 release. This data will be submitted to dbSNP, and once rsid's have been allocated, will be absorbed into the UCSC and Ensembl browsers according to their respective release cycles. Until that point any non rs SNP id's on this site are temporary and will NOT be maintained.

as I really can't give any other advice but to look on the 1000 genomes website for this information, since I haven't found a way to look for this information I can only suggest to digest their raw data as we did. in case you want to save time, you may want to have a look to the the raw genotypes we processed from this latest release (interesting note for any BioStar reader: there are only bi-allelic markers because their genotype caller limits it - we have asked the project to include a note on the readme file to clarify this). if you go to our ENGINES tool and try searching for HBA1 and HBA2 and selecting all 14 available populations, you will end up looking at 26 variants, 20 of them being in dbSNP132 too and 6 of them being new, and having most of them very low MAF values (19 of them are below 0.1). although this is not as much as the 500 sites you were expecting, I really hope this result helps in some way.