Missing Snps For Hba1-Hba2 In 1000 Genome Data??
2
0
Entering edit mode
12.6 years ago
Tarbem ▴ 10

Hey guys,

I wanted to extract the SNPs called for HBA1 and HBA2 genes in 1000 genome project. However, these two genes appear to have no SNPs - no missense or samesense SNPs.

I cross-checked on ensemble (1000 genome browser), and dbSNP reports about 500 non-synonymous SNPs in the exonic region of HBA1.

What are the odds that 629 people in the 1000 genome project happen to have exactly the same coding sequence for HBA1? I guess, very unlikely.

Am I missing something obvious?

ps. Laura, from 1000 genome project, provided some explanation on a similar issue regarding missing genotypes here: http://biostar.stackexchange.com/questions/9550/why-so-many-missing-genotypes-in-1000-genomes-data

But I think the situation in this post is quite different, and I appreciate to hear any idea why the snps are missing.

Thanks

genome snp dbsnp • 3.1k views
ADD COMMENT
1
Entering edit mode
12.6 years ago
lh3 33k

Because HBA1 and HBA2 are nearly identical in the coding regions. You cannot do much about that with short reads.

ADD COMMENT
0
Entering edit mode
12.6 years ago

I must say that I first tried to reproduce your error at the 1000 genomes browser searching for HBA1 and I indeed didn't see the expected variation, but then I realized that the track I was looking at corresponded to 20100804 data, which is what Laura described, and not the latest release. in fact, this is the note at the welcome page of the browser:

The 1000 Genomes Browser

Ensembl-based browser provides early access to 1000genomes data

In order to facilitate immediate analysis of the 1000genomes data by the whole scientific community, this browser (based on Ensembl) integrates the SNP calls from the August 2010 release. This data will be submitted to dbSNP, and once rsid's have been allocated, will be absorbed into the UCSC and Ensembl browsers according to their respective release cycles. Until that point any non rs SNP id's on this site are temporary and will NOT be maintained.

as I really can't give any other advice but to look on the 1000 genomes website for this information, since I haven't found a way to look for this information I can only suggest to digest their raw data as we did. in case you want to save time, you may want to have a look to the the raw genotypes we processed from this latest release (interesting note for any BioStar reader: there are only bi-allelic markers because their genotype caller limits it - we have asked the project to include a note on the readme file to clarify this). if you go to our ENGINES tool and try searching for HBA1 and HBA2 and selecting all 14 available populations, you will end up looking at 26 variants, 20 of them being in dbSNP132 too and 6 of them being new, and having most of them very low MAF values (19 of them are below 0.1). although this is not as much as the 500 sites you were expecting, I really hope this result helps in some way.

ADD COMMENT
0
Entering edit mode

Hey Jorge,

Thanks for your reply, it was very helpful - (I did not know about bi-allelic markers.)

I parsed the following file: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat/ds_flat_ch16.flat.gz (HBA1 resides on chr16)

... and did not hit any variation at genomic regions corresponding to exons for HBA1.

How did you exactly recover those 26 variants? Did you guys parse some different file?

ADD REPLY
0
Entering edit mode

indeed we did Tarbem. I thought you were referring to the 1000 genomes data, so the files I understood you were interested in were those at the project's ftp site: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20101123/interim_phase1_release/ which, by the way, although they have been placed on 20101123 folder they are from the May 2011 release. a little bit confusing, I guess.

ADD REPLY
0
Entering edit mode

The release directories are named for the sequence release the data is based on rather the date they are released on

You cam see snp tracks coloured for consequences from vcf files using the attach remote file option from manage your data so you can attach the vcf files from the 20101123 release

ADD REPLY

Login before adding your answer.

Traffic: 2698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6