Align human reference genome and ancestral genome
1
0
Entering edit mode
7.6 years ago
ostrokach ▴ 350

I would like to get a list of amino acid mutations that occured in the course of evolution from early primates to current humans.

I found the homo_sapiens_ancestor_GRCh38_e86.tar.gz file on the Ensembl ftp site (ftp://ftp.ensembl.org/pub/release-86/fasta/ancestral_alleles/), which, as I understand, is the inferred genome of the primate ancesstor. This file contains a fasta sequence for every chromosome.

I can also downloda the fasta sequence of the human reference genome: ftp://ftp.ensembl.org/pub/release-86/fasta/homo_sapiens/dna/.

My question is, what tool should I use to align the reference and ancestral genomes and get a VCF file with all the SNPs? Sorry if this has been answered a million times already. Most of the information I found was concerning the mapping of fastq files to the reference genome.

Once I have a VCF file with a list of SNPs, I can run TransVar or SnpEff to convert SNPs to amino acid changes.

Thanks!

genome alignment • 2.7k views
ADD COMMENT
2
Entering edit mode
7.6 years ago

I am not sure you're going about this in the right way. If you're interested in amino-acid changes in proteins, you can get this information from a phylogenetic tree. You can find gene trees in Ensembl.

ADD COMMENT
0
Entering edit mode

Thanks very much for your input! I was following the methods for CADD, but I guess they had to do the alignments at the genome level in order to be able to analyse non-coding variants. I looked at your ensembl link, but could not find any pairwise alignments that I could download. A quick google search led me to treefam, which allows you to download protein-protein mapping between two species (http://www.treefam.org/download#tabview=tab1). I guess I can use that mapping and perform pairwise amino acid alignments myself?

ADD REPLY
1
Entering edit mode

You don't need pairwise alignments but the multiple sequence alignments used to build the trees. EnsEMBL adapted the TreeFam pipeline for their compara database. You should be able to get alignments, HMMs and trees from both resources. For EnsEMBL, it may be easier to use the perl API.

ADD REPLY

Login before adding your answer.

Traffic: 3155 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6