Which formula of Bray-Curtis beta diversity index should be used for microbiome beta diversity analysis?
1
0
Entering edit mode
6 weeks ago
dpc ▴ 170

Hi community!!! I am seeing two different formulae for the Bray-Curtis beta diversity Index. Wikipedia shows different formula from this website . Can anyone please tell me which one should be used in microbiome diversity analysis?

diversity • 304 views
1
Entering edit mode
6 weeks ago
antonioggsousa ★ 2.1k

Hi @dpc,

Which software you'll use to calculate the Bray-Curtis distance (beta diversity)?

I think this can be more important than the difference between formulae used in the wiki and the website highlighted.

I did not went through both websites, but there is dissimilarity and similarity Bray-Curtis indexes (and probably other variations of the index).

If you use for instance the phyloseq R package to estimate the Bray-Curtis distance, I think you will be fine.

António

0
Entering edit mode

Yes sir. I am using phyloseq::distance function for the calculation with option method= "bray". My question is which formula is used in this case for the calculation? I am confused between the following two:

1. and,

0
Entering edit mode

take one row, compute it yourself, and you'll immediately know which formula is used.

you can also consult the original publication for phyloseq

0
Entering edit mode

I tested and both formulas seem to be equivalent to me or at least they yield the same result for the dummy OTU table used.

You can see below my implementation in R of the formula in the Wiki using the OTU dummy table from the other site and the result is a Bray-Curtis dissimilarity matrix equal to the other site and to phyloseq. So, I think although the formulas are different they are equivalent. This is my opinion based on the test. The implementation on the site does not check for the sum of the minimum value of each taxa/OTU across samples, neither subtract the division to one, whereas the wiki does.

# bray-curtis implementation

# matrix
otu_tbl <- matrix(
c(1, 3, 0, 1, 0, 0, 2, 0,
4, 4, 0, 0, 6, 2, 1),
ncol = 3,
byrow = FALSE
)
colnames(otu_tbl) <- LETTERS[1:3]
rownames(otu_tbl) <- paste0("OTU", 1:5)

bray_curtis <- function(mtx) { # assume that samples are in columns
# construct a distance mtx
samples <- colnames(mtx)
dist_mtx <- matrix(NA, nrow = length(samples), ncol = length(samples),
dimnames = list(samples, samples))
# loop over a mtx i,j:
for ( i in seq(nrow(dist_mtx)) ) { # loop over rows
row_samp <- rownames(dist_mtx)[i] # row sample
for ( j in seq(ncol(dist_mtx)) ) { # loop over cols
col_samp <- colnames(dist_mtx)[j] # col sample
sub_mtx <- mtx[,c(row_samp, col_samp)] # get sub mtx for samples compared
# sum the min number across samples for each taxa
Cij <- apply(sub_mtx, 1, function(x) min(x))
Cij <- sum(Cij)
# sum of the counts of sample i
Si <-  sum(mtx[,row_samp])
# sum of the counts of sample j
Sj <- sum(mtx[,col_samp])
# bray-curtis: https://en.wikipedia.org/wiki/Bray%E2%80%93Curtis_dissimilarity
dist_mtx[i,j] <- 1 - ( 2 * Cij / ( Si + Sj ) )
}
}
return(dist_mtx)
}

# test the function
bray_curtis(mtx = otu_tbl)
#        A         B         C
# A 0.0000000 0.6000000 0.8571429
# B 0.6000000 0.0000000 0.6842105
# C 0.8571429 0.6842105 0.0000000

# test in phyloseq
library("phyloseq")
distance(physeq = otu_table(otu_tbl, taxa_are_rows = TRUE), method = "bray")
#       A         B
# B 0.6000000
# C 0.8571429 0.6842105


Anyway, using the phyloseq implementation of Bray-Curtis is a safe option (that relies on vegan).

I hope this helps,

António