Question: Avoid repetition in script
gravatar for aroso491
8 months ago by
United Kingdom
aroso4910 wrote:


I have a little script that allows me to create overlapped Manhattan plots with ggplot2 between 2 different dataframes (SNPs and CpGs). I have interesting columns in 22 chromosomes and intend to get 22 plots. However, I was thinking that repeating the same code over and over 22 times was a bit of an overkill, but I do not know if it is possible to do it a bit more compact with R.

This is the code fragment I will have to repeat over and over if I want this to work:

chr1_SCPG_computed <- ch1_SCPG %>% 

  # Compute chromosome size
  group_by(CHR) %>% 

  # Calculate cumulative position of each chromosome
  mutate(tot=cumsum(as.numeric(chr_len))-chr_len) %>%
  dplyr::select(-chr_len) %>%

  # Add this info to the initial dataset
  left_join(ch1_SCPG, ., by=c("CHR"="CHR")) %>%

  # Add a cumulative position of each SNP
  arrange(CHR, POSITION) %>%
  mutate( BPcum=POSITION+tot)

axisC1_2 = chr1_SCPG_computed %>% group_by(CHR) %>% summarize(center=(max(BPcum) + min(BPcum) ) / 2 )
axisC1_2$CHR <- as.numeric(axisC1_2$CHR)

  geom_point(data=chr1_SSNP_computed, aes(x=BPcum, y=-log10(P)), colour = "red", size=1.3, alpha=0.5, shape=1)+
  geom_point(data=chr1_SCPG_computed, aes(x=BPcum, y=-log10(P)), colour = "dark green", size=1.3, alpha=0.5, shape=1)+
  geom_hline(yintercept = -log10(sig), color ="red", linetype = "dashed")+
  scale_x_continuous(label= axisC1$CHR, breaks = axisC1$center)+
  scale_y_continuous(expand = c(0, 0))+
  theme_bw() +
    panel.border = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank()

Any ideas would be appreciated, thanks!

cpg manhattan plot snp R plotting • 257 views
ADD COMMENTlink written 8 months ago by aroso4910

Writing for loops in R

Important tip for using ggplot in a for loop

Also look at the *apply family of functions in R, they are sometimes faster and easier to use then writing a loop.

ADD REPLYlink modified 8 months ago • written 8 months ago by ashish570

Do you have any object containing 'ch1_SCPG'-like pieces, such as dataframe or list? Also, will something else change in iterations in addition to the 'ch1_SCPG'-like piece?

ADD REPLYlink modified 7 months ago • written 7 months ago by Alex Nesmelov190

Yes, this is my dataframe ch1_SCPG. The columns are (ignoring the first column): ID CHR POSITION P STUDY_ID and the idea is that I will be able to do this for all chromosomes (1 to 22). I just don't see how can I loop it as it is a chunky piece of code and it confuses me.

1   cg16145216  1   42385662    6.7e-48 JoehannesCN
2   cg19406367  1   66999929    7.0e-44 JoehannesCN
3   cg05603985  1   2161049 1.8e-43 JoehannesCN
8   cg26856289  1   24307516    8.6e-35 JoehannesCN
29  cg26764244  1   68299511    1.8e-28 JoehannesCN
ADD REPLYlink modified 7 months ago • written 7 months ago by aroso4910

I meant actually whether there is one object (list or table) holding the data for different chromosomes. As I understand, ch1_SCPG contains the data for the first chromosome - how did you get this table, by loading some separate file or by filtering some large table with data for all chromosomes?

Also, left_join(ch1_SCPG, ., by=c("CHR"="CHR")) actually performs right join - to make it clear, right_join(ch1_SCPG, "CHR") may be used explicitly.

ADD REPLYlink modified 7 months ago • written 7 months ago by Alex Nesmelov190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 985 users visited in the last hour