Avoid repetition in script
0
0
Entering edit mode
3.8 years ago
aroso491 • 0

Hello!

I have a little script that allows me to create overlapped Manhattan plots with ggplot2 between 2 different dataframes (SNPs and CpGs). I have interesting columns in 22 chromosomes and intend to get 22 plots. However, I was thinking that repeating the same code over and over 22 times was a bit of an overkill, but I do not know if it is possible to do it a bit more compact with R.

This is the code fragment I will have to repeat over and over if I want this to work:

#PLOT OF CHROMOSOME 1 
chr1_SCPG_computed <- ch1_SCPG %>% 

  # Compute chromosome size
  group_by(CHR) %>% 
  summarise(chr_len=max(POSITION))%>% 

  # Calculate cumulative position of each chromosome
  mutate(tot=cumsum(as.numeric(chr_len))-chr_len) %>%
  dplyr::select(-chr_len) %>%

  # Add this info to the initial dataset
  left_join(ch1_SCPG, ., by=c("CHR"="CHR")) %>%

  # Add a cumulative position of each SNP
  arrange(CHR, POSITION) %>%
  mutate( BPcum=POSITION+tot)

axisC1_2 = chr1_SCPG_computed %>% group_by(CHR) %>% summarize(center=(max(BPcum) + min(BPcum) ) / 2 )
axisC1_2$CHR <- as.numeric(axisC1_2$CHR)

ggplot()+
  geom_point(data=chr1_SSNP_computed, aes(x=BPcum, y=-log10(P)), colour = "red", size=1.3, alpha=0.5, shape=1)+
  geom_point(data=chr1_SCPG_computed, aes(x=BPcum, y=-log10(P)), colour = "dark green", size=1.3, alpha=0.5, shape=1)+
  geom_hline(yintercept = -log10(sig), color ="red", linetype = "dashed")+
  scale_x_continuous(label= axisC1$CHR, breaks = axisC1$center)+
  scale_y_continuous(expand = c(0, 0))+
  theme_bw() +
  theme( 
    legend.position="none",
    panel.border = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank()
  )

Any ideas would be appreciated, thanks!

R SNP cpg manhattan plot plotting • 1.4k views
ADD COMMENT
3
Entering edit mode

Writing for loops in R

Important tip for using ggplot in a for loop

Also look at the *apply family of functions in R, they are sometimes faster and easier to use then writing a loop.

ADD REPLY
0
Entering edit mode

Do you have any object containing 'ch1_SCPG'-like pieces, such as dataframe or list? Also, will something else change in iterations in addition to the 'ch1_SCPG'-like piece?

ADD REPLY
0
Entering edit mode

Yes, this is my dataframe ch1_SCPG. The columns are (ignoring the first column): ID CHR POSITION P STUDY_ID and the idea is that I will be able to do this for all chromosomes (1 to 22). I just don't see how can I loop it as it is a chunky piece of code and it confuses me.

1   cg16145216  1   42385662    6.7e-48 JoehannesCN
2   cg19406367  1   66999929    7.0e-44 JoehannesCN
3   cg05603985  1   2161049 1.8e-43 JoehannesCN
8   cg26856289  1   24307516    8.6e-35 JoehannesCN
29  cg26764244  1   68299511    1.8e-28 JoehannesCN
ADD REPLY
0
Entering edit mode

I meant actually whether there is one object (list or table) holding the data for different chromosomes. As I understand, ch1_SCPG contains the data for the first chromosome - how did you get this table, by loading some separate file or by filtering some large table with data for all chromosomes?

Also, left_join(ch1_SCPG, ., by=c("CHR"="CHR")) actually performs right join - to make it clear, right_join(ch1_SCPG, "CHR") may be used explicitly.

ADD REPLY

Login before adding your answer.

Traffic: 1663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6