Question: Avoid repetition in script
0
gravatar for aroso491
4 weeks ago by
aroso4910
United Kingdom
aroso4910 wrote:

Hello!

I have a little script that allows me to create overlapped Manhattan plots with ggplot2 between 2 different dataframes (SNPs and CpGs). I have interesting columns in 22 chromosomes and intend to get 22 plots. However, I was thinking that repeating the same code over and over 22 times was a bit of an overkill, but I do not know if it is possible to do it a bit more compact with R.

This is the code fragment I will have to repeat over and over if I want this to work:

#PLOT OF CHROMOSOME 1 
chr1_SCPG_computed <- ch1_SCPG %>% 

  # Compute chromosome size
  group_by(CHR) %>% 
  summarise(chr_len=max(POSITION))%>% 

  # Calculate cumulative position of each chromosome
  mutate(tot=cumsum(as.numeric(chr_len))-chr_len) %>%
  dplyr::select(-chr_len) %>%

  # Add this info to the initial dataset
  left_join(ch1_SCPG, ., by=c("CHR"="CHR")) %>%

  # Add a cumulative position of each SNP
  arrange(CHR, POSITION) %>%
  mutate( BPcum=POSITION+tot)

axisC1_2 = chr1_SCPG_computed %>% group_by(CHR) %>% summarize(center=(max(BPcum) + min(BPcum) ) / 2 )
axisC1_2$CHR <- as.numeric(axisC1_2$CHR)

ggplot()+
  geom_point(data=chr1_SSNP_computed, aes(x=BPcum, y=-log10(P)), colour = "red", size=1.3, alpha=0.5, shape=1)+
  geom_point(data=chr1_SCPG_computed, aes(x=BPcum, y=-log10(P)), colour = "dark green", size=1.3, alpha=0.5, shape=1)+
  geom_hline(yintercept = -log10(sig), color ="red", linetype = "dashed")+
  scale_x_continuous(label= axisC1$CHR, breaks = axisC1$center)+
  scale_y_continuous(expand = c(0, 0))+
  theme_bw() +
  theme( 
    legend.position="none",
    panel.border = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank()
  )

Any ideas would be appreciated, thanks!

cpg manhattan plot snp R plotting • 136 views
ADD COMMENTlink written 4 weeks ago by aroso4910
3

Writing for loops in R

Important tip for using ggplot in a for loop

Also look at the *apply family of functions in R, they are sometimes faster and easier to use then writing a loop.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by ashish480

Do you have any object containing 'ch1_SCPG'-like pieces, such as dataframe or list? Also, will something else change in iterations in addition to the 'ch1_SCPG'-like piece?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Alex Nesmelov110

Yes, this is my dataframe ch1_SCPG. The columns are (ignoring the first column): ID CHR POSITION P STUDY_ID and the idea is that I will be able to do this for all chromosomes (1 to 22). I just don't see how can I loop it as it is a chunky piece of code and it confuses me.

1   cg16145216  1   42385662    6.7e-48 JoehannesCN
2   cg19406367  1   66999929    7.0e-44 JoehannesCN
3   cg05603985  1   2161049 1.8e-43 JoehannesCN
8   cg26856289  1   24307516    8.6e-35 JoehannesCN
29  cg26764244  1   68299511    1.8e-28 JoehannesCN
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by aroso4910

I meant actually whether there is one object (list or table) holding the data for different chromosomes. As I understand, ch1_SCPG contains the data for the first chromosome - how did you get this table, by loading some separate file or by filtering some large table with data for all chromosomes?

Also, left_join(ch1_SCPG, ., by=c("CHR"="CHR")) actually performs right join - to make it clear, right_join(ch1_SCPG, "CHR") may be used explicitly.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Alex Nesmelov110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1658 users visited in the last hour