Question: How to construct phylogeny based on multiple sequence alignment of orthologs without assembling the genomes
0
gravatar for abbhinay
3.9 years ago by
abbhinay0
abbhinay0 wrote:

I have two sets of phylogeny-

1) Species phylogeny (in black)- Species B to D have published genomes and I have assembled a genome for Species A. I constructed the phylogeny based on multiple sequence alignment of protein orthologs across Species A to D (OrthoMCl -> MUSCLE -> trimAl -> MrBayes).

2) Subspecies phylogeny (in red) - I also have sequencing data for different subspecies and isolates of Species A. I mapped these onto Species A genome, identified SNPs (using GATK) and drew a SNP-based phylogeny.

My question now is "what is the best way to integrate both these phylogenies into one?".

I do not want to assemble the genomes for all the subspecies (tedious for 20 isolates), and I do not want to map the Species B-D reads onto Species A (They are very divergent and inferring through MSA is best I think).

I can infer nucleic acid/protein sequences of the subspecies' orthologs from variant calls and add them to the multiple sequence alignment in Species phylogeny. But I find the output of tools like vcf2fq and FastaAlternateReferenceMaker complicated -New Fasta Sequence From Reference Fasta And Variant Calls File?. In this case, how to deal with SNPs in repetitive regions that we usually exclude from analysis?

Is there any other way to achieve this?

example phylogeny

snp alignment phylogeny • 1.7k views
ADD COMMENTlink written 3.9 years ago by abbhinay0

assemble the genomes ... tedious for 20 isolates

I find the output of tools like vcf2fq and FastaAlternateReferenceMaker complicated

What is more efficient may depend on genome size and ploidity. For bacteria I would recommend to assemble the reads denovo with spades, which is fast and very easy to use. For bacteria denovo assembling is not at all "tedious".

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by piet1.7k

Genome size is 20Mb and the organism is haploid. So denovo assembly is tedious (ordering, filling gaps, annotating genes).

ADD REPLYlink written 3.9 years ago by abbhinay0

how to deal with SNPs in repetitive regions

You should not do phylogeny on repetitive regions. Repeats are formed by recombination and recombination events will distort the phylogenetic signal.

ADD REPLYlink written 3.9 years ago by piet1.7k

In addition, highly repetitive regions are prone to sequencing errors, and thus unreliable variant calls.

ADD REPLYlink written 3.9 years ago by WouterDeCoster44k

Thanks @piet @WouterDeCoster. Will keep that in mind! As of now, I do have discarded all SNPs in DustMasker predicted regions.

ADD REPLYlink written 3.9 years ago by abbhinay0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1552 users visited in the last hour