I was wondering about options for creating the "chain" file for converting genome coordinates from one genome assembly to another. Malachi Griffith did an excellent summary about Converting Genome Coordinates From One Genome Version To Another but most of these tools actually need the "chain" file (that is the file that describes the pair-wise alignments between two genomes) so I would like to know how to create this file or whether there is any tool doing the coordinates transference just starting from the 2 genomes and a file of annotated features (eg bed, gff). Thanks!
Does anyone know if there is a tool to make the chain files required for all of these programs without relying on UCSC tools? Every Program I have found uses UCSC dependencies that you need to pay for if you aren't academic if you want to make a chain file for your own genome.
I found the post and top answer to be helpful. So, thank you very much!
For example, I downloaded the UCSC executables from here.
I think followed the minimal instructions, which I found I could further modify (for single-chromosome sequences that were each less than 500,000 bp):
#prepare files cd $ID1 faToTwoBit $ID1.fa $ID1.2bit twoBitInfo $ID1.2bit chrom.sizes cd .. cd $ID2 faToTwoBit $ID2.fa $ID2.2bit twoBitInfo $ID2.2bit chrom.sizes cd .. # create .chain file blat $ID1/$ID1.2bit $ID2/$ID2.fa $ID1\to$ID2.psl -tileSize=12 -minScore=100 -minIdentity=98 axtChain -linearGap=medium -psl $ID1\to$ID2.psl $ID1/$ID1.2bit $ID2/$ID2.2bit $ID1\to$ID2.chain
I was also able to run CrossMap (installed using
pip3 install CrossMap), to confirm that .chain file can be run without generating any error messages:
CrossMap.py gff $CHAIN.gz $GFFIN $GFFOUT
Whether or not CrossMap provided the best conversion could be up for debate, and I am not sure if you might want to change the parameters to generate the .chain file in some circumstances.
However, I think this is enough to show that the custom .chain file generation was successful.