Avoid repetition in script
0
0
Entering edit mode
2.2 years ago
aroso491 • 0

Hello!

I have a little script that allows me to create overlapped Manhattan plots with ggplot2 between 2 different dataframes (SNPs and CpGs). I have interesting columns in 22 chromosomes and intend to get 22 plots. However, I was thinking that repeating the same code over and over 22 times was a bit of an overkill, but I do not know if it is possible to do it a bit more compact with R.

This is the code fragment I will have to repeat over and over if I want this to work:

#PLOT OF CHROMOSOME 1
chr1_SCPG_computed <- ch1_SCPG %>%

# Compute chromosome size
group_by(CHR) %>%
summarise(chr_len=max(POSITION))%>%

# Calculate cumulative position of each chromosome
mutate(tot=cumsum(as.numeric(chr_len))-chr_len) %>%
dplyr::select(-chr_len) %>%

# Add this info to the initial dataset
left_join(ch1_SCPG, ., by=c("CHR"="CHR")) %>%

# Add a cumulative position of each SNP
arrange(CHR, POSITION) %>%
mutate( BPcum=POSITION+tot)

axisC1_2 = chr1_SCPG_computed %>% group_by(CHR) %>% summarize(center=(max(BPcum) + min(BPcum) ) / 2 )
axisC1_2$CHR <- as.numeric(axisC1_2$CHR)

ggplot()+
geom_point(data=chr1_SSNP_computed, aes(x=BPcum, y=-log10(P)), colour = "red", size=1.3, alpha=0.5, shape=1)+
geom_point(data=chr1_SCPG_computed, aes(x=BPcum, y=-log10(P)), colour = "dark green", size=1.3, alpha=0.5, shape=1)+
geom_hline(yintercept = -log10(sig), color ="red", linetype = "dashed")+
scale_x_continuous(label= axisC1$CHR, breaks = axisC1$center)+
scale_y_continuous(expand = c(0, 0))+
theme_bw() +
theme(
legend.position="none",
panel.border = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()
)


Any ideas would be appreciated, thanks!

R SNP cpg manhattan plot plotting • 856 views
3
Entering edit mode

Writing for loops in R

Important tip for using ggplot in a for loop

Also look at the *apply family of functions in R, they are sometimes faster and easier to use then writing a loop.

0
Entering edit mode

Do you have any object containing 'ch1_SCPG'-like pieces, such as dataframe or list? Also, will something else change in iterations in addition to the 'ch1_SCPG'-like piece?

0
Entering edit mode

Yes, this is my dataframe ch1_SCPG. The columns are (ignoring the first column): ID CHR POSITION P STUDY_ID and the idea is that I will be able to do this for all chromosomes (1 to 22). I just don't see how can I loop it as it is a chunky piece of code and it confuses me.

1   cg16145216  1   42385662    6.7e-48 JoehannesCN
2   cg19406367  1   66999929    7.0e-44 JoehannesCN
3   cg05603985  1   2161049 1.8e-43 JoehannesCN
8   cg26856289  1   24307516    8.6e-35 JoehannesCN
29  cg26764244  1   68299511    1.8e-28 JoehannesCN

0
Entering edit mode

I meant actually whether there is one object (list or table) holding the data for different chromosomes. As I understand, ch1_SCPG contains the data for the first chromosome - how did you get this table, by loading some separate file or by filtering some large table with data for all chromosomes?

Also, left_join(ch1_SCPG, ., by=c("CHR"="CHR")) actually performs right join - to make it clear, right_join(ch1_SCPG, "CHR") may be used explicitly.