Question: genome mapping help requested
gravatar for truebeliever24
11 months ago by
truebeliever2420 wrote:

Hi everybody,

I'm struggling heavily with the following, and could use some in-depth help, perhaps over message or chat (I'd be happy to pay you for your time), with the following:

I have a dataset of whole genomes for 134 individuals of two species, one of the two species has two subspecies. I want to build a phylogeny incorporating my data, but need an outgroup. I want to use the Anna's Hummingbird genome off of ncbi, gather raw reads for an Anna's individual off ncbi, then align the raw reads to the genome. Since my current dataset was aligned to the Anna's genome, I want to combine this new Anna's individual bam file with my current dataset and produce the phylogeny. Please message me if you are interested in helping out. I've been on this for several days now. Thanks!

alignment • 227 views
ADD COMMENTlink modified 11 months ago by Brice Sarver3.5k • written 11 months ago by truebeliever2420
gravatar for Brice Sarver
11 months ago by
Brice Sarver3.5k
United States
Brice Sarver3.5k wrote:

What you're asking for is possible but requires an intermediate-level bioinformatics skillset. No application that estimates phylognies takes BAMs as inputs. The starting place is generally a FASTA (or Phylip or Nexus) alignment because you need homologous sequences to calculate single-site likelihoods.

I would recommend calling variants and confidently-called genotypes, injecting those back into the reference genomes (see the GATK's FastaAlternateReferenceMaker or your programming language of choice), extracting those regions based an annotation, combining them, aligning them (in the case of indels), and inferring phylogenies from them using your frequentist or Bayesian application of choice.

If you're only dealing with two species, you're looking at more population-level trees. You can probably get away with using a distance-based optimality criterion, like neighbor joining, instead. You may even be able to just use SNPs themselves, especially if you do joint variant calling.

I did this at the species level for mice and published the general approach in GBE in 2017. You can find a description of how I estimated phylogenies there, as well.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Brice Sarver3.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1505 users visited in the last hour