I'm looking for datasets to analyze including mitochondrial DNA sequence and Y chromosome SNP data from Hapmap and HGDP panels. Does anyone know where to obtain them?
I'm looking for datasets to analyze including mitochondrial DNA sequence and Y chromosome SNP data from Hapmap and HGDP panels. Does anyone know where to obtain them?
you won't be able to retrieve mitochondrial DNA sequence from HapMap nor from HGDP panels, since they only typed chrMT and they didn't sequenced it. all you may be able to obtain are frequencies and genotypes for the particular loci typed on their samples. you will find the latest hapmap release on flat files by population and chromosome through the appropriate FTP site folder, and I guess that the best place to find HGDP data would be the official CEPH db site, from where you may browse their database or directly bulk download flat files by chromosome.
we used to retrieve such data for our population genetics web tool SPSmart, so let me just add a few lines here describing some findings that we've come through. the main problem for us, and I guess that it'll be the same one for other researchers, is that on both projects these chromosome data has been reported as biallelic, probably due to file format normalization in order to use the same one for every chromosome, and it is not well described how this biallelic situation should be treated. for that reason we have decided not to include chrY nor chrMT on our tool, and neither chrX since we also found biallelic calls for male samples which break all our frequencies and other population statistics indexes we calculate.
PS: if anyone has information about how to deal with these 3 special chromosomes data I'd be glad to start a discussion on this from scratch, since population geneticist will definitely benefit from it.
One way to get SNP annotations is to go via a BioMart installation, either ensembl BioMart or HapMap Hapmart. Both provide chromosome Y SNPs, only ensembl biomart provides both MT and chromosome Y SNPs.
Both Marts allow to filter by genotyping platform. In order to find a GWAS study that did genotype those SNPs I would search for studies using a genotyping platform platform that contains such SNPs and then ask for access to the genotyping data. Access GWAS data is generally governed by a strict privacy policy. The Welcome Trust Case Control Consortium provides access to GWAS data via an aplication process.
I put some biomart queries here as hyperlinks to serve as examples in case that is what this question is about. Maybe Jorge can comment better on the relevance of these annotations.
I guess kcheng should be the one telling us whether these datasets suit his needs or not. to be honest, I've never been completely confident on chrX, chrY and chrMT data, so all the work we've done with them had to be deeply thought, using the validated pipelines we knew that worked with the other chromosomes only when being sure that they were suitable (almost always these had to be modified to deal with the chromosome data nature, even depending on the chosen mart).
Very interesting. I only achieved 11 SNPs from allSNP150 UCSD in chrM.
chrom chromStart chromEnd name refNCBI observed
chrM 515 518 rs879104404 CA -/CA
chrM 517 520 rs878880226 CA -/CA
chrM 524 527 rs78907894 AC -/AC
chrM 5132 5135 rs199476116 AA -/AA
chrM 8042 8045 rs199474828 AT -/AT
chrM 8271 8281 rs371604158 ACCCCCTCT -/ACCCCCTCT
chrM 8281 8291 rs369704279 CCCCCTCTA -/CCCCCTCTA
chrM 9205 9208 rs199476137 TA -/TA
chrM 9487 9503 rs267606612 TCGCAGGATTTTTCT -/TCGCAGGATTTTTCT
chrM 14787 14792 rs207460005 TTAA -/TTAA
chrM 16180 16183 rs371240719 AA -/AA
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
request for clarification, not at all clear to me: what do you want? a) the sequence of the chromosomes b) the SNP annotations from dbSNP c) GWA study data using tools including these d) GWA studies where these variations 'come up'?