Question

Avoid repetition in script

0

Entering edit mode

3.8 years ago

aroso491 • 0

Hello!

I have a little script that allows me to create overlapped Manhattan plots with ggplot2 between 2 different dataframes (SNPs and CpGs). I have interesting columns in 22 chromosomes and intend to get 22 plots. However, I was thinking that repeating the same code over and over 22 times was a bit of an overkill, but I do not know if it is possible to do it a bit more compact with R.

This is the code fragment I will have to repeat over and over if I want this to work:

#PLOT OF CHROMOSOME 1 
chr1_SCPG_computed <- ch1_SCPG %>% 

  # Compute chromosome size
  group_by(CHR) %>% 
  summarise(chr_len=max(POSITION))%>% 

  # Calculate cumulative position of each chromosome
  mutate(tot=cumsum(as.numeric(chr_len))-chr_len) %>%
  dplyr::select(-chr_len) %>%

  # Add this info to the initial dataset
  left_join(ch1_SCPG, ., by=c("CHR"="CHR")) %>%

  # Add a cumulative position of each SNP
  arrange(CHR, POSITION) %>%
  mutate( BPcum=POSITION+tot)

axisC1_2 = chr1_SCPG_computed %>% group_by(CHR) %>% summarize(center=(max(BPcum) + min(BPcum) ) / 2 )
axisC1_2$CHR <- as.numeric(axisC1_2$CHR)

ggplot()+
  geom_point(data=chr1_SSNP_computed, aes(x=BPcum, y=-log10(P)), colour = "red", size=1.3, alpha=0.5, shape=1)+
  geom_point(data=chr1_SCPG_computed, aes(x=BPcum, y=-log10(P)), colour = "dark green", size=1.3, alpha=0.5, shape=1)+
  geom_hline(yintercept = -log10(sig), color ="red", linetype = "dashed")+
  scale_x_continuous(label= axisC1$CHR, breaks = axisC1$center)+
  scale_y_continuous(expand = c(0, 0))+
  theme_bw() +
  theme( 
    legend.position="none",
    panel.border = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank()
  )

Any ideas would be appreciated, thanks!

R SNP cpg manhattan plot plotting • 1.4k views

ADD COMMENT • link 3.8 years ago by aroso491 • 0

3

Entering edit mode

Writing for loops in R

Important tip for using ggplot in a for loop

Also look at the *apply family of functions in R, they are sometimes faster and easier to use then writing a loop.

ADD REPLY • link 3.8 years ago by ashish ▴ 680

0

Entering edit mode

Do you have any object containing 'ch1_SCPG'-like pieces, such as dataframe or list? Also, will something else change in iterations in addition to the 'ch1_SCPG'-like piece?

ADD REPLY • link 3.8 years ago by Alex Nesmelov ▴ 200

0

Entering edit mode

Yes, this is my dataframe ch1_SCPG. The columns are (ignoring the first column): ID CHR POSITION P STUDY_ID and the idea is that I will be able to do this for all chromosomes (1 to 22). I just don't see how can I loop it as it is a chunky piece of code and it confuses me.

1   cg16145216  1   42385662    6.7e-48 JoehannesCN
2   cg19406367  1   66999929    7.0e-44 JoehannesCN
3   cg05603985  1   2161049 1.8e-43 JoehannesCN
8   cg26856289  1   24307516    8.6e-35 JoehannesCN
29  cg26764244  1   68299511    1.8e-28 JoehannesCN

ADD REPLY • link 3.8 years ago by aroso491 • 0

0

Entering edit mode

I meant actually whether there is one object (list or table) holding the data for different chromosomes. As I understand, ch1_SCPG contains the data for the first chromosome - how did you get this table, by loading some separate file or by filtering some large table with data for all chromosomes?

Also, left_join(ch1_SCPG, ., by=c("CHR"="CHR")) actually performs right join - to make it clear, right_join(ch1_SCPG, "CHR") may be used explicitly.

ADD REPLY • link 3.8 years ago by Alex Nesmelov ▴ 200