Question: Impute: File Formats ?
4
gravatar for Pierre Lindenbaum
4.6 years ago by
France
Pierre Lindenbaum68k wrote:

Hi all,

I'm about to start working with the data available on Impute ( https://mathgen.stats.ox.ac.uk/impute/impute.html ). My collaborator is currently away from is mail.

From https://mathgen.stats.ox.ac.uk/impute/impute_v2.html , I downloaded https://mathgen.stats.ox.ac.uk/impute/1kG_b36_aug09_ceu.tgz (132Mo)

this archive contains 'legend' file, containing a list of SNP/position/Allele-0/Allele-1:

head 1kG_b36_aug09_ceu_chr10.legend
rsID position a0 a1
rs61838558 54767 C T
rs28887774 55878 C G
rs12262442 56397 C T
rs4121579 56695 T A
10-57163 57163 G A
rs9943471 57774 G C
rs35819232 58533 T G
rs11253482 58575 C T
rs34829118 59071 G A

the second type of file is the hap file

head -n2 1kG_b36_aug09_ceu_chr10.hap
1 0 0 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1
1 0 0 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1
0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 0 0 0 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0 1 0 0 1 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0

here, the number of rows is the number of SNPs in the legend file. Ok.

But I wonder how those 112 columns should be read. It should be a something about the haplotypes for each samples (where can I get the pedigree ?) but what does it mean ? should I read each pair of numbers to get the state of the current snp of both chromosome or is it something else ?

Thank you for your help

Pierre

ADD COMMENTlink modified 4.1 years ago by Lars Juhl Jensen8.7k • written 4.6 years ago by Pierre Lindenbaum68k
2
gravatar for Lars Juhl Jensen
4.6 years ago by
Copenhagen, Denmark
Lars Juhl Jensen8.7k wrote:

I am far from an expert on this, but the HAPMAP CEU population consists of 56 individuals, which gives 2*56=112 haplotypes. So that would fit with one haplotype per column in the file. Coincidentally, that is also consistent with what is written in the impute documentation of the -h option ;-)

ADD COMMENTlink written 4.6 years ago by Lars Juhl Jensen8.7k

many thanks Lars :-) As usual, I should have RTFM :-)

ADD REPLYlink written 4.6 years ago by Pierre Lindenbaum68k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 814 users visited in the last hour