genome mapping help requested
1
0
Entering edit mode
4.7 years ago

Hi everybody,

I'm struggling heavily with the following, and could use some in-depth help, perhaps over message or chat (I'd be happy to pay you for your time), with the following:

I have a dataset of whole genomes for 134 individuals of two species, one of the two species has two subspecies. I want to build a phylogeny incorporating my data, but need an outgroup. I want to use the Anna's Hummingbird genome off of ncbi, gather raw reads for an Anna's individual off ncbi, then align the raw reads to the genome. Since my current dataset was aligned to the Anna's genome, I want to combine this new Anna's individual bam file with my current dataset and produce the phylogeny. Please message me if you are interested in helping out. I've been on this for several days now. Thanks!

alignment • 670 views
ADD COMMENT
1
Entering edit mode
4.7 years ago
Brice Sarver ★ 3.8k

What you're asking for is possible but requires an intermediate-level bioinformatics skillset. No application that estimates phylognies takes BAMs as inputs. The starting place is generally a FASTA (or Phylip or Nexus) alignment because you need homologous sequences to calculate single-site likelihoods.

I would recommend calling variants and confidently-called genotypes, injecting those back into the reference genomes (see the GATK's FastaAlternateReferenceMaker or your programming language of choice), extracting those regions based an annotation, combining them, aligning them (in the case of indels), and inferring phylogenies from them using your frequentist or Bayesian application of choice.

If you're only dealing with two species, you're looking at more population-level trees. You can probably get away with using a distance-based optimality criterion, like neighbor joining, instead. You may even be able to just use SNPs themselves, especially if you do joint variant calling.

I did this at the species level for mice and published the general approach in GBE in 2017. You can find a description of how I estimated phylogenies there, as well.

ADD COMMENT

Login before adding your answer.

Traffic: 2980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6