Add reference genomes as an outgroup to a phylogenetic tree
0
0
Entering edit mode
3 days ago
Álvaro • 0

Hello,

I've build a Neighbour-Joining tree using a VCF file containing ~40 samples aligned to a reference genome. I used plink and a custom R script. Now I want to add the reference genomes of different species as an outgroup for the tree. How can I do it? Do I need to align those genomes to the one of the species that i'm working with? If so, how?

Thank you in advance :)

phylogenetic tree • 824 views
ADD COMMENT
2
Entering edit mode

May I ask for a bit more details, here:

  • Which species and outgroup?
  • How similar are the species on the DNA, CDS, and protein level.
  • What kind of raw data do you have?

There are some general problems with your approach when including cross-species data. You don't really want to do cross-species variant calling on whole genomes, as it will inflate branch length and doesn't account for structural differences. Further, you should move from NJ to more robust methods, such as ML or Bayesian analysis, if you don't have that many samples.

ADD REPLY
0
Entering edit mode

I'm working with Salmo trutta and wanted to use Salmo salar as an outgroup. I have whole-genome .fastq data of the trout samples. Sorry, I'm not too sure how similar the species are on the DNA level. Thank you :)

ADD REPLY
0
Entering edit mode

I'm sure there are phylogenetics papers out there comparing species within this genus - a quick google seems to suggest there is up to 15M years since they split followed by significant chromosomal rearrangements. You haven't said what data you have for salar, but I would be wary about mapping salar reads to the trutta reference. I guess it comes down to why you want to include a different species in your tree and whether that merits the additional hassle this might become.

ADD REPLY
0
Entering edit mode

That's interesting from a Norwegian perspective. You also may find SalmoBase useful.

ADD REPLY
1
Entering edit mode

I agree with the other comment about some of the problems this could introduce. But if it's a sister taxa and you want to root your tree, for example, then there are a few approaches that would suffice. One solution could be extracting a handful of genes commonly used in phylogenetics in your system and creating a simple alignment to feed into a ML tree (RAxML has good documentation).

Otherwise it's whole genome alignment or cross-species variant calling which are problematic or computationally demanding.

ADD REPLY
0
Entering edit mode

Thank you I will try this!

ADD REPLY
0
Entering edit mode

Traditionally, mtDNA sequences have been used for this species. Several rounds of whole-genome duplication and autotetraploidy make whole nuclear genome phylogenetic inference demanding.

ADD REPLY

Login before adding your answer.

Traffic: 2713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6