Question: Impute2 Procedure
6.7 years ago by
romsen60 wrote:


I am trying to impute genome wide SNP data with impute2. The problem is, I'm a biologist and far from an expert in this informatics field. Perhaps you can help me. I installed the software and the examples you can find on the website work very well. But now I want to run the analysis with my data. I think about a pre-phasing and afterwards an imputation into pre-phased haplotypes with two phased reference panels. (HapMap and 1000G, only CEU samples).

I don't have problems generating the -g and the corresponding strand_g files. But the -m, -h , -l files are confusing. In the tutorial you find a downloadlink for these reference files. I download the 1000 Genomes Pilot + HapMap 3 CEU build 36-file (500mb) and it seems that the required information (m, h, l) is included in this file. But I'm not sure and don't know how to type it in the command line and run the analysis. In the examples there are separate files for each flag. How can I handle this?

The next point is I have problems to tell the programm to use only the required information of chromosome 1 for example. Do I have to generate files for every chromosome? And if it's right, how? I didn't find any command in impute2 to restrict the data for a specific chromosome.

I know that these are rather basic questions, but maybe someone has a quick and easy solution.

platform is windows

6.7 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

The file that you download is an archive and inside of it you can find a separate file for each chromosome as well as each type (recombination, haplotype and legend).

On windows you may use a tool like 7-Zip to extract the file.

To restrict the operation to just one chromosome I think you just need to use the corresponding chromosome file. If that does not work just ask that question separately.

6.7 years ago by
romsen60 wrote:

Thanks for your help. Unfortunately I can find only one file in the archive. But if I open it in excel I see three different file types. (of course excel can't open the complete txt.file) Then there must be a download or extraction problem?!

I downloaded this CEU file and tried to extract with winrar and unzip. I add a picture of winrar and 7-zip.

there are two levels of extractions (it is a bit counterintuitive how that tool works). Extract the second file that you see as well. I checked that archive and it contains 104 files!

ialbert@porthos ~/Downloads/hapmap3_r2_plus_1000g_jun2010_b36_ceu
$ ls 
README.txt                                            hapmap3.r2.b36.allMinusPilot1CEU.chr5.snpfilt.haps
genetic_map_chr10_combined_b36.txt                    hapmap3.r2.b36.allMinusPilot1CEU.chr5.snpfilt.legend
genetic_map_chr11_combined_b36.txt                    hapmap3.r2.b36.allMinusPilot1CEU.chr6.snpfilt.haps
genetic_map_chr12_combined_b36.txt                    hapmap3.r2.b36.allMinusPilot1CEU.chr6.snpfilt.legend
genetic_map_chr13_combined_b36.txt                    hapmap3.r2.b36.allMinusPilot1CEU.chr7.snpfilt.haps
genetic_map_chr14_combined_b36.txt                    hapmap3.r2.b36.allMinusPilot1CEU.chr7.snpfilt.legend
genetic_map_chr15_combined_b36.txt                    hapmap3.r2.b36.allMinusPilot1CEU.chr8.snpfilt.haps
genetic_map_chr16_combined_b36.txt                    hapmap3.r2.b36.allMinusPilot1CEU.chr8.snpfilt.legend
genetic_map_chr17_combined_b36.txt                    hapmap3.r2.b36.allMinusPilot1CEU.chr9.snpfilt.haps
genetic_map_chr18_combined_b36.txt                    hapmap3.r2.b36.allMinusPilot1CEU.chr9.snpfilt.legend
genetic_map_chr19_combined_b36.txt                    hapmap3.r2.b36.allMinusPilot1CEU.samples
genetic_map_chr1_combined_b36.txt                     pilot1.jun2010.b36.CEU.chr1.snpfilt.haps
genetic_map_chr20_combined_b36.txt                    pilot1.jun2010.b36.CEU.chr1.snpfilt.legend
genetic_map_chr21_combined_b36.txt                    pilot1.jun2010.b36.CEU.chr10.snpfilt.haps
genetic_map_chr22_combined_b36.txt                    pilot1.jun2010.b36.CEU.chr10.snpfilt.legend
genetic_map_chr2_combined_b36.txt                     pilot1.jun2010.b36.CEU.chr11.snpfilt.haps
genetic_map_chr3_combined_b36.txt                     pilot1.jun2010.b36.CEU.chr11.snpfilt.legend
genetic_map_chr4_combined_b36.txt                     pilot1.jun2010.b36.CEU.chr12.snpfilt.haps
genetic_map_chr5_combined_b36.txt                     pilot1.jun2010.b36.CEU.chr12.snpfilt.legend
genetic_map_chr6_combined_b36.txt                     pilot1.jun2010.b36.CEU.chr13.snpfilt.haps
genetic_map_chr7_combined_b36.txt                     pilot1.jun2010.b36.CEU.chr13.snpfilt.legend
... and so on ...
Perfect. I switched to a mac and used the terminal to extract the gzip-file afterwards a standard unarchiver and yes it looks like yours. Puuh thank you very much!

