Question: Build Phylogenetic Trees From Bam Files
gravatar for Luca Beltrame
6.8 years ago by
Luca Beltrame220
IRCCS Istituto di Ricerche Farmacologiche "Mario Negri", Milano, Italy
Luca Beltrame220 wrote:


more and more publications are showing phylogenetic trees (often rootless) to represent similarities between samples from the same origin (e.g. tumors). I've set out trying to do the same thing, except that all my searches didn't find anything, neither a way to do this directly, nor a way to do it via conversion. I only found a program called POPBAM which supposedly generates trees but it doesn't really work (segfaults immediately).

Given a series of BAM files, what would be the best way to use/convert them to a format that can be used to build trees?

bam sequencing • 3.8k views
ADD COMMENTlink modified 6.6 years ago by Fwip490 • written 6.8 years ago by Luca Beltrame220
gravatar for Fwip
6.6 years ago by
United States
Fwip490 wrote:

I've found more success with the POPBAM source from github. (I used commit 0cdbacc2fd869e0b65a64bf5ff38ca1c21f41657)

The important thing to note, though, is the header adjustment you need to do in order for POPBAM to recognize your samples. From

To enable POPBAM to perform population-level analyses, it is first necessary to modify the input BAM file header. Users must add the "PO" tag to the header line for each read group. The "PO" tag can be any string, as long as the string is identical between samples from the same population. One example may be that a BAM file has three read groups (R21, R22, and R25). The R22 and R25 read groups are from two different lines of Drosophila melanogaster called "MEL01" and "MEL02", while the third read group, R21, is from a single line of D. simulans called "SIM01". Below is an example of the BAM header including the "PO" tag:


First, be sure to include readgroup information:

samtools merge -rh group1.header.txt group1.bam CD3674.bam CD3688.bam CD3692.bam CD3700.bam CD3719.bam


 @HD VN:1.3  SO:coordinate                       
 @SQ SN:NC_009089  LN:4290252  AS:NC_009089      
 @RG ID:CD3674 SM:CD3674 PO:CD3674               
 @RG ID:CD3688 SM:CD3688 PO:CD3688               
 @RG ID:CD3692 SM:CD3692 PO:CD3692               
 @RG ID:CD3700 SM:CD3700 PO:CD3700               
 @RG ID:CD3719 SM:CD3719 PO:CD3719

And finally, run as so:

popbam tree -f ref.fasta NC_009089:1-42000000 -o group1.txt > group1.tree

(As far as I can tell, the region is required, the -o output file is ignored, and output is written to stdout.)

ADD COMMENTlink modified 9 months ago by RamRS30k • written 6.6 years ago by Fwip490
gravatar for Fabio Marroni
6.8 years ago by
Fabio Marroni2.6k
Fabio Marroni2.6k wrote:

You may:

  1. Post a sample of input file and the exact error message of POPBAM so that maybe someone might help you (I never used popbam)
  2. Use bam to build a consensus sequence and then compare consensus sequences (which will be in fasta format) to build a phylogenetic tree. I don't knowhow phylogenetic software packages will behave if you give them sequences that may be gigabases in size.
  3. Use BAM to obtain SNPs, then use SNPs to represent genetic distance between any two samples and then use the distance matrix as input in a phylogeny inference package.
ADD COMMENTlink modified 11 months ago by RamRS30k • written 6.8 years ago by Fabio Marroni2.6k

Thanks for the answer. I'm using retargeted sequencing on small number of targets (30 genes), so I'm assuming it won't be a big problem.

When you mention a consensus sequence, you mean using stuff like pileup to generate it?

ADD REPLYlink written 6.8 years ago by Luca Beltrame220

Pileup might work. There are plenty of tools that go from pileup to consensus (varscan, gakt...)

ADD REPLYlink written 6.8 years ago by Fabio Marroni2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1575 users visited in the last hour