Question: formating for fineSTRUCTURE input
3.7 years ago
I would like to use fineSTRUCTURE to access the population structure of a bacterial species. Thus I will be inputting SNP data.

However, I don't understand how to create the 'phased' data format that fineSTRUCTURE requires. The fineSTRUCTURE manual lists multiple programmes to help with this phasing process, such as phase, beagle, shapeit, impute2 etc however, I don't know were to even start with these....

For example PHASE requires me to input my data in the following format...



P Position(1) Position(2) Position(NumberOfLoci) LocusType(1) LocusType(2) ... LocusType(NumberOfLoci) ID(1)









But how to I get this?!?!?!

As it stands I have the core genome alignment, the SNP alignment and a VCF of my data. How do I use these formats to phase my data?? Can anyone help to point me in the right direction??

Many many thanks!!!

written 3.7 years ago by natasha100
Hello. You need to use a program such as ShapeIT or IMPUTE2 to phase your VCF file. ShapeIT takes VCF files as input and will output a phased file format which you will need to convert to chrompainter format (these scripts and tools are provided on fineStructure website). In addition everything I am mentioning is explicitly written in the fineStructure manual in grave detail. You can utilize any other phasing software you desire but the ones I mentioned are recommended by fineStructure authors.

