How to create a graph that shows pairwise Fst values at single SNP loci across the all chromosomes in R
1
2
Entering edit mode
11 weeks ago
Curls ▴ 40

Hi all,

I am trying to obtain a similar graph as shown in the figure with my data. My data includes Fst values for each SNP based the populations (pop1 vs pop2 etc) from admixture analysis. I would appreciate any help.

R Fst graph • 379 views
2
Entering edit mode
10 weeks ago
cmdcolin ★ 3.0k

this is not a complete answer, but it may help to empower you to create your own graphs.

i try to keep track of a lot of different visualization related tools (https://cmdcolin.github.io/awesome-genome-visualization/), but indeed, just hand-coding the solution in R is a good approach. I get my mind blown by ggplot2 capabilities

this blogpost was very enlightening for me for creating plots in R with proper e.g. chromosome labels along the bottom, and plotting data according to a 'cumulative' bp position

https://danielroelfs.com/blog/how-i-create-manhattan-plots-using-ggplot/

here is a quick approach at doing something for your problem

first generate 5 random regions (10Mbp long to make them easily visible) with bedtools random on command line, you'll have real data

bedtools random -g hg19.fa.gz.fai -n 5 -l 10000000 | sort -k1,1 -k2,2n > out1.bed
bedtools random -g hg19.fa.gz.fai -n 5 -l 10000000 | sort -k1,1 -k2,2n > out2.bed
bedtools random -g hg19.fa.gz.fai -n 5 -l 10000000 | sort -k1,1 -k2,2n > out3.bed

library(ggplot2)
library(tidyverse)

colnames(chrom_sizes) <- c("chr", "size")
colnames(x1) <- c("chr", "start", "end", "name", "score", "strand")
colnames(x2) <- c("chr", "start", "end", "name", "score", "strand")
colnames(x3) <- c("chr", "start", "end", "name", "score", "strand")

# sometimes read.table interprets chromosome names as integers, sometimes
# characters, force to characters
x1$chr <- as.character(x1$chr)
x2$chr <- as.character(x2$chr)
x3$chr <- as.character(x3$chr)

x1$score <- runif(n = nrow(x1), min = 1, max = 100) x2$score <- runif(n = nrow(x1), min = 1, max = 100)
x3$score <- runif(n = nrow(x1), min = 1, max = 100) data_cum <- chrom_sizes %>% group_by(chr) %>% summarise(max_bp = max(size)) %>% arrange(-max_bp) %>% mutate(bp_add = lag(cumsum(as.numeric(max_bp)), default = 0)) process <- function(df) { df %>% inner_join(data_cum, by = "chr") %>% mutate(bp_start_cum = start + bp_add) %>% mutate(bp_end_cum = end + bp_add) } data1 <- process(x1) data2 <- process(x2) data3 <- process(x3) data1$id <- "id1"
data2$id <- "id2" data3$id <- "id3"

df <- rbind(data1, data2, data3)
print(df)

axis_set <- data_cum %>%
group_by(chr) %>%

p <- ggplot(data = df, aes(color = score, xmin = bp_start_cum, xmax = bp_end_cum, ymin = 0, ymax = 1)) +
geom_rect() +
scale_x_continuous(label = axis_set$chr, breaks = axis_set$center) +
facet_grid(rows = vars(id)) +
theme(
axis.text.x = element_text(angle = 60, size = 8, vjust = 0.5),
axis.ticks.y = element_blank(),
axis.text.y = element_blank()
)

ggsave("out.png", p, width = 11, height = 3)


generates a figure like this

as you can see it's not really done, and not identical to yours, but might be a starting point. updated slightly with facet_grid after posting originally

example todos

• chromosomes not sorted by name, but by length
• labels on left instead of right