Question: Align human reference genome and ancestral genome
gravatar for ostrokach
2.5 years ago by
ostrokach280 wrote:

I would like to get a list of amino acid mutations that occured in the course of evolution from early primates to current humans.

I found the homo_sapiens_ancestor_GRCh38_e86.tar.gz file on the Ensembl ftp site (, which, as I understand, is the inferred genome of the primate ancesstor. This file contains a fasta sequence for every chromosome.

I can also downloda the fasta sequence of the human reference genome:

My question is, what tool should I use to align the reference and ancestral genomes and get a VCF file with all the SNPs? Sorry if this has been answered a million times already. Most of the information I found was concerning the mapping of fastq files to the reference genome.

Once I have a VCF file with a list of SNPs, I can run TransVar or SnpEff to convert SNPs to amino acid changes.


alignment genome • 935 views
ADD COMMENTlink modified 2.5 years ago by Jean-Karim Heriche18k • written 2.5 years ago by ostrokach280
gravatar for Jean-Karim Heriche
2.5 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche18k wrote:

I am not sure you're going about this in the right way. If you're interested in amino-acid changes in proteins, you can get this information from a phylogenetic tree. You can find gene trees in Ensembl.

ADD COMMENTlink written 2.5 years ago by Jean-Karim Heriche18k

Thanks very much for your input! I was following the methods for CADD, but I guess they had to do the alignments at the genome level in order to be able to analyse non-coding variants. I looked at your ensembl link, but could not find any pairwise alignments that I could download. A quick google search led me to treefam, which allows you to download protein-protein mapping between two species ( I guess I can use that mapping and perform pairwise amino acid alignments myself?

ADD REPLYlink written 2.5 years ago by ostrokach280

You don't need pairwise alignments but the multiple sequence alignments used to build the trees. EnsEMBL adapted the TreeFam pipeline for their compara database. You should be able to get alignments, HMMs and trees from both resources. For EnsEMBL, it may be easier to use the perl API.

ADD REPLYlink written 2.5 years ago by Jean-Karim Heriche18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1511 users visited in the last hour