How to get gene sequences from non-model species for phylogenetic tree construction?
1
1
Entering edit mode
2.2 years ago
ges29 ▴ 50

Hi,

I'm trying to create a phylogenetic tree to describe the relationship between 8 different yeast species (listed below).

My preferred method for creating the tree is to align the same 6 gene sequences from each species using a programme like MEGA. I'm not able to use protein sequences, as I do not have this information for all of the species.

The issue I have is that some of the published annotations for these species aren't very thorough (ie. most annotated genes are called "hypothetical protein"). Therefore, I can't find the sequences for the 6 genes I'd like to align.

What methods can I use to isolate the sequences of my preferred genes?

I tried using blastn to query the sequence for COX1 from C. Lusitaniae against the M. pulcherrima genome, but my concern is that the sequence I pulled out may be truncated or incomplete.

My other worry is that by using a known gene from a species that's part of the analysis, I'm introducing a bias towards that species (ie. by searching for C. Lusitaniae's COX1, I'm going to find the closest thing to C. Lusitaniae's COX1 and not necessarily the real COX1 for that species).

Yeast Species:

  • C. Lusitaniae
  • C. Auris
  • C. Albicans
  • M. pulcherrima
  • M. persimmonesis
  • M. bicuspidata
  • M. borealis
  • M. orientalis
yeast blast annotation phylogenetics • 1.0k views
ADD COMMENT
1
Entering edit mode

You could take whatever sequence data is available for these species and see if you have any of the OrthoDB/BUSCO conserved sequences available in all data sets. BUSCO should actually come in pretty handy here, as it can also score sequences on the basis of completeness. You could then just take whatever BUSCO gene(s) is/are present in all species and use them for the phylogenetic analysis.

ADD REPLY
0
Entering edit mode

Your best bet is to query NCBI Gene database with the name of the gene and species you are looking for. Here is one example.

ADD REPLY
0
Entering edit mode

Thanks for your response. So that's what I have tried but most of the species I'm working with (barring C. Lusitaniae, C. Auris and C. Albicans) haven't had these genes annotated and blast results come back with impartial matches.

ADD REPLY
1
Entering edit mode

If the annotations do not pre-exist in GenBank then this is not going to be straight forward. You may actually need to blast against individual genomes (which themselves may be incomplete) identify sequence of interest, extract and then create alignments.

It just depends on how much work/time you are willing to invest and quality of public data.

ADD REPLY
1
Entering edit mode
2.2 years ago

The easiest way would be just to upload your FASTA sequences to MOSGA: MOSGA Comparative Genomics

It will identify all single-copy genes and create the phylogenetic tree as it was demonstrated here (those are, in fact, yeast strains): Yeast example

The outcome is identical with DOI: 10.1038/s41586-018-0030-5

In our case, you should select "Fungi" as the lineage dataset in the BUSCO "Settings" menu.

ADD COMMENT
0
Entering edit mode

Just a tiny addition: after completion of your MOSGA run, you can open the log viewer and see the identifier of all common BUSCOs. By that, you can get the gene sequences (via MOSGA) of each identified single-copy gene. The benefit is that you do not have to install anything.

ADD REPLY

Login before adding your answer.

Traffic: 2537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6