Which formula of Bray-Curtis beta diversity index should be used for microbiome beta diversity analysis?
Entering edit mode
16 months ago
dpc ▴ 240

Hi community!!! I am seeing two different formulae for the Bray-Curtis beta diversity Index. Wikipedia shows different formula from this website . Can anyone please tell me which one should be used in microbiome diversity analysis?

diversity • 1.5k views
Entering edit mode
16 months ago
antonioggsousa ★ 2.5k

Hi @dpc,

Which software you'll use to calculate the Bray-Curtis distance (beta diversity)?

I think this can be more important than the difference between formulae used in the wiki and the website highlighted.

I did not went through both websites, but there is dissimilarity and similarity Bray-Curtis indexes (and probably other variations of the index).

If you use for instance the phyloseq R package to estimate the Bray-Curtis distance, I think you will be fine.


Entering edit mode

Yes sir. I am using phyloseq::distance function for the calculation with option method= "bray". My question is which formula is used in this case for the calculation? I am confused between the following two:

  1. enter image description here and,

  2. enter image description here

Entering edit mode

take one row, compute it yourself, and you'll immediately know which formula is used.

you can also consult the original publication for phyloseq

Entering edit mode

I tested and both formulas seem to be equivalent to me or at least they yield the same result for the dummy OTU table used.

You can see below my implementation in R of the formula in the Wiki using the OTU dummy table from the other site and the result is a Bray-Curtis dissimilarity matrix equal to the other site and to phyloseq. So, I think although the formulas are different they are equivalent. This is my opinion based on the test. The implementation on the site does not check for the sum of the minimum value of each taxa/OTU across samples, neither subtract the division to one, whereas the wiki does.

bray-curtis implementation

# matrix 
otu_tbl <- matrix(
c(1, 3, 0, 1, 0, 0, 2, 0, 
  4, 4, 0, 0, 6, 2, 1),
ncol = 3,
byrow = FALSE
colnames(otu_tbl) <- LETTERS[1:3]
rownames(otu_tbl) <- paste0("OTU", 1:5)

bray_curtis <- function(mtx) { # assume that samples are in columns
# construct a distance mtx
samples <- colnames(mtx)
dist_mtx <- matrix(NA, nrow = length(samples), ncol = length(samples), 
                   dimnames = list(samples, samples))
# loop over a mtx i,j: 
for ( i in seq(nrow(dist_mtx)) ) { # loop over rows 
  row_samp <- rownames(dist_mtx)[i] # row sample 
  for ( j in seq(ncol(dist_mtx)) ) { # loop over cols
    col_samp <- colnames(dist_mtx)[j] # col sample
    sub_mtx <- mtx[,c(row_samp, col_samp)] # get sub mtx for samples compared
    # sum the min number across samples for each taxa
    Cij <- apply(sub_mtx, 1, function(x) min(x))
    Cij <- sum(Cij)
    # sum of the counts of sample i
    Si <-  sum(mtx[,row_samp])
    # sum of the counts of sample j
    Sj <- sum(mtx[,col_samp])
    # bray-curtis: https://en.wikipedia.org/wiki/Bray%E2%80%93Curtis_dissimilarity
    dist_mtx[i,j] <- 1 - ( 2 * Cij / ( Si + Sj ) )

# test the function
bray_curtis(mtx = otu_tbl)
#        A         B         C
# A 0.0000000 0.6000000 0.8571429
# B 0.6000000 0.0000000 0.6842105
# C 0.8571429 0.6842105 0.0000000

# test in phyloseq
distance(physeq = otu_table(otu_tbl, taxa_are_rows = TRUE), method = "bray")
#       A         B
# B 0.6000000          
# C 0.8571429 0.6842105

Anyway, using the phyloseq implementation of Bray-Curtis is a safe option (that relies on vegan).

I hope this helps,



Login before adding your answer.

Traffic: 750 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6