Question

Errors With Loading Hapmap Genotype Dump File Into Haploview

0

Entering edit mode

10.8 years ago

pilotlog ▴ 40

Hi all, I am a bit new to this forum and don't have a programming background (more of biology, but trying to learn bioinformatics a bit). This may sound dumb but I am trying to do something relatively simple and small in scope compared to what most people on this forum are doing - I just want to find haplotype blocks and tagging SNPs in the gene XRCC2, so I can do an association study focusing on this gene but minimizing the number of SNPs I have to type (<$).

I already downloaded the HapMap genotype data for the CEU population (HapMap rel27 B36) from the HapMap Genome Browser for XRCC2 (co-ordinates chr7: 151969353..152009352). But I am having issues to load this file into Haploview. I have read a few webpages/documents saying to just click on the "HapMap Format" button when Haploview opens, go to browse, selecting the file with the dumped region (genotype data) from HapMap, and hitting ok. But then I get an error message: "HapMap data format error: totalcount". Total Count is the last column in the file, which is supposed to be the total number of genotypes observed. As I have not changed anything in the HapMap dump file, I am not sure why it has a formatting error. Haploview's own documentation says this:

"HapMap Project Data Dumps Data from the HapMap Project can be dumped by region using the GBrowse interface. The saved data file is in a marker-per-line format which can be loaded in Haploview. GBrowse dumps only one file, which has one marker per line and which includes familial relationships among the HapMap samples as well as marker position information. The file format has several header lines (beginning with "#") which Haploview parses. Open the file by selecting "Browse HapMap Data" option and selecting the downloaded file."

I thought that GBrowse referred to the International HapMap Project Genome Browser webpage, but the only genotype data file I can see from rel27 B36 is under the Reports&Analysis dropdown where it says "Download SNP Genotype Data". When I opened the genotype data file in Excel, I can see that there are a number of SNPs in the rows with chr, pos, strand, build... etc... then genotype, genotype frequency, genotype count. I didn't see any header that looked like it was describing familial relationships... so do I have the wrong file here completely? Or am I supposed to modify this file so that Haploview can parse it?

Again, I am sure this is a bit of a dumb question but I am new to this type of thing and I would appreciate any help anyone can give me. Also sorry if this post is a bit long but I wanted to make it clear what I've already done/tried to do.

haploview hapmap • 5.1k views

ADD COMMENT • link updated 10.7 years ago by Biostar 20 • written 10.8 years ago by pilotlog ▴ 40

0

Entering edit mode

Ok, so I feel really stupid now... I figured out that Haploview didn't like totalcount because there wasn't even supposed to be a count in there.. I had the genotype frequency dump file instead of the genotype data dump file. The files look similar in the first few columns but after that, they are totally different. Haploview is opening for me now with no issues, except that it gives me an error if I check the "display HapMap track" option when loading the file. But as long as the genotype file loaded okay, I don't mind too much. Even though it is embarrassing, I will leave this here instead of deleting it, in case someone as clueless as me reads this with the same issue.

ADD REPLY • link 10.8 years ago by pilotlog ▴ 40