I would like to get a list of amino acid mutations that occured in the course of evolution from early primates to current humans.
I found the
homo_sapiens_ancestor_GRCh38_e86.tar.gz file on the Ensembl ftp site (ftp://ftp.ensembl.org/pub/release-86/fasta/ancestral_alleles/), which, as I understand, is the inferred genome of the primate ancesstor. This file contains a fasta sequence for every chromosome.
I can also downloda the fasta sequence of the human reference genome: ftp://ftp.ensembl.org/pub/release-86/fasta/homo_sapiens/dna/.
My question is, what tool should I use to align the reference and ancestral genomes and get a VCF file with all the SNPs? Sorry if this has been answered a million times already. Most of the information I found was concerning the mapping of fastq files to the reference genome.
Once I have a VCF file with a list of SNPs, I can run TransVar or SnpEff to convert SNPs to amino acid changes.