Question

Ordering admixture stacked barplot based on multiple values

0

Entering edit mode

4.1 years ago

msul • 0

I have a dataset from which I am constructing a stacked barplot in R, and I want to know how I can arrange the stacked barplot where "similar" individuals cluster together. My dataset is an admixture proportions dataset Q. Here is the dataset which is a d-by-n matrix. In this toy dataset, there are d=10 ancestral populations and n = 5 individuals:

Here is my dataset construction:

> a
            V1          V2          V3           V4           V5
1  0.534410243 0.009358740 0.011295181 0.2141751740 0.0030129254
2  0.026653603 0.372426720 0.447847534 0.0179177507 0.4072904477
3  0.193317915 0.003605024 0.003186611 0.4832114736 0.0007095471
4  0.111881585 0.000000000 0.000000000 0.2296213741 0.0119233461
5  0.089696570 0.591163629 0.509774416 0.0032542030 0.5535847030
6  0.007543558 0.000000000 0.000000000 0.0364907757 0.0013148362
7  0.004862942 0.000000000 0.002123909 0.0146682272 0.0004053690
8  0.009276195 0.011710457 0.014367894 0.0000000000 0.0000000000
9  0.006903171 0.004314528 0.011404455 0.0000000000 0.0126889937
10 0.015454219 0.007420903 0.000000000 0.0006610215 0.0090698319

I create a stacked barplot like so:

pop <- rownames(a)
a <- a %>% mutate(pop = rownames(a))
a_long <- gather(a, key, value, -pop)

# trying to create a similarity index
a_long <- a_long %>% group_by(key) %>% 
  mutate(mean = mean(value)) %>%
  arrange(desc(mean))

# looking at some of the expanded dataset
> a_long[1:20,]
# A tibble: 20 x 4
# Groups:   key [2]
   pop   key      value  mean
   <chr> <chr>    <dbl> <dbl>
 1 1     V2    0.00936    0.1
 2 2     V2    0.372      0.1
 3 3     V2    0.00361    0.1
 4 4     V2    0          0.1
 5 5     V2    0.591      0.1
 6 6     V2    0          0.1
 7 7     V2    0          0.1
 8 8     V2    0.0117     0.1
 9 9     V2    0.00431    0.1
10 10    V2    0.00742    0.1
11 1     V4    0.214      0.1
12 2     V4    0.0179     0.1
13 3     V4    0.483      0.1
14 4     V4    0.230      0.1
15 5     V4    0.00325    0.1
16 6     V4    0.0365     0.1
17 7     V4    0.0147     0.1
18 8     V4    0          0.1
19 9     V4    0          0.1
20 10    V4    0.000661   0.1

# colors
v_colors <- c("#440154FF", "#443B84FF", "#34618DFF", "#404588FF", "#1FA088FF", "#40BC72FF",
              "#67CC5CFF", "#A9DB33FF", "#DDE318FF", "#FDE725FF")

plot <- ggplot(a_long, aes(x = key, y = value, fill = pop)) 
plot + geom_bar(position="stack", stat="identity") +  scale_fill_manual(values = v_colors)

The output looks like this:

barplot

How can I make the output look more neat, e.g. with the individuals with higher proportion of population 5 ancestry be next to each other on the x-axis? So far, I have tried to compute the "mean" of value of each individual, but it didn't work since it's not a good measure. How can I create a similarity index that tells me how similar individual 1 is to individual 2, and then how do I order it them on the x-axis so that they look well-clustered (e.g. like the barplots in this figure)?

Thanks!

In case you want to recreate the data frame a in the example above:

v1 = c(0.534410243, 0.026653603, 0.193317915, 0.111881585, 0.089696570, 0.007543558, 0.004862942, 0.009276195, 0.006903171, 0.015454219)
v2 = c(0.009358740, 0.372426720, 0.003605024, 0.000000000, 0.591163629, 0.000000000, 0.000000000, 0.011710457, 0.004314528, 0.007420903)
v3 = c(0.011295181, 0.447847534, 0.003186611, 0.000000000, 0.509774416, 0.000000000, 0.002123909, 0.014367894, 0.011404455, 0.000000000) 
v4 = c(0.2141751740, 0.0179177507, 0.4832114736, 0.2296213741, 0.0032542030, 0.0364907757, 0.0146682272, 0.0000000000, 0.0000000000, 0.0006610215)
v5 = c(0.0030129254, 0.4072904477, 0.0007095471, 0.0119233461, 0.5535847030, 0.0013148362, 0.0004053690, 0.0000000000, 0.0126889937, 0.0090698319)
a = data.frame(V1 = v1, V2 = v2, V3 = v3, V4 = v4, V5 = v5)

R barplot stacked-barplot ordering genome • 1.9k views

ADD COMMENT • link 4.1 years ago by msul • 0