Question: Impute: File Formats ?
3
gravatar for Pierre Lindenbaum
3.7 years ago by
France
Pierre Lindenbaum58k wrote:

Hi all,

I'm about to start working with the data available on Impute ( https://mathgen.stats.ox.ac.uk/impute/impute.html ). My collaborator is currently away from is mail.

From https://mathgen.stats.ox.ac.uk/impute/impute_v2.html , I downloaded https://mathgen.stats.ox.ac.uk/impute/1kG_b36_aug09_ceu.tgz (132Mo)

this archive contains 'legend' file, containing a list of SNP/position/Allele-0/Allele-1:

head 1kG_b36_aug09_ceu_chr10.legend
rsID position a0 a1
rs61838558 54767 C T
rs28887774 55878 C G
rs12262442 56397 C T
rs4121579 56695 T A
10-57163 57163 G A
rs9943471 57774 G C
rs35819232 58533 T G
rs11253482 58575 C T
rs34829118 59071 G A

the second type of file is the hap file

head -n2 1kG_b36_aug09_ceu_chr10.hap
1 0 0 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1
1 0 0 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1
0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 0 0 0 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0 1 0 0 1 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0

here, the number of rows is the number of SNPs in the legend file. Ok.

But I wonder how those 112 columns should be read. It should be a something about the haplotypes for each samples (where can I get the pedigree ?) but what does it mean ? should I read each pair of numbers to get the state of the current snp of both chromosome or is it something else ?

Thank you for your help

Pierre

ADD COMMENTlink modified 3.3 years ago by Lars Juhl Jensen8.6k • written 3.7 years ago by Pierre Lindenbaum58k
2
gravatar for Lars Juhl Jensen
3.7 years ago by
Copenhagen, Denmark
Lars Juhl Jensen8.6k wrote:

I am far from an expert on this, but the HAPMAP CEU population consists of 56 individuals, which gives 2*56=112 haplotypes. So that would fit with one haplotype per column in the file. Coincidentally, that is also consistent with what is written in the impute documentation of the -h option ;-)

ADD COMMENTlink written 3.7 years ago by Lars Juhl Jensen8.6k

many thanks Lars :-) As usual, I should have RTFM :-)

ADD REPLYlink written 3.7 years ago by Pierre Lindenbaum58k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 881 users visited in the last hour