Hi,
I have been given whole genome alignments from the cactus alignment program in HAL format.
I would like to do analyses of ancestry and infer recombination rates for this data, and so would like to run ADMIXTURE, and then RASPberry on them.
ADMIXTURE requires .bed or .ped files (and their associated files such as .map).
RASPberry requires a bim file, files in "phased format", an additional "phased format" file or a ped/map file.
I'd like some advice or guidance on how I go from the file I've been given, to any of these files required.
First, I am unsure of what this "phased format" is, but by looking at the top of an example file provided with the software it looks like this:
rsID position_b36 NB12718_A NB12718_B NB12718_0_A NB12045_A NB12045_B NB12045_0_
rs6565705 13905 A A A A A A
rs7502403 15463 A A A A A A
rs8064924 18901 C A A A A A
rs8075072 19389 A A A A A A
rs28702002 22211 A C A A A A
rs8070440 27151 A A A A A A
rs3794811 34276 A A C A A A
Every row is a SNP, every two columns is one individual. What programs and utilities deal with this format, or output it? If I can go from HAL to this, then that would be great.
I've seen from the README here that this tool can produce BED files of SNPs, but it only does so between two sequences of the alignment. My gut feeling is that for ADMIXTURE, I'll have to make several of these bed files making sure to use the same reference sequence, and then merge them somehow - is this possible?
Thanks,
Ben W.