Question

How to plot chipseq peak relative to TSS for some genes ?

0

Entering edit mode

3.0 years ago

Sarah ▴ 10

I have a file with with start and end peak position according to TSS. My file contains 4 coumns (gene, start_peak, end_peak, strand). I would create a distribution plot. On the x-axis is the distance from the TSS and On the y-axis is number of peak found at that distance. I want to bin the distances from the TSS, say every 500 bp.nIt is not read density but Chipseq peak density How to plot ChIP-seq Density vs Distance from TSS using my file and R? How to use these start and end peak position, because when I plot this ggplot(data, aes(x=start_peak)) + geom_density() it is only according to start peak ! And why my data are not centered on the TSS ? Could you help me ? Thanks

chipseq R plotwithR TSS peak • 1.9k views

ADD COMMENT • link updated 3.0 years ago by Carambakaracho ★ 3.2k • written 3.0 years ago by Sarah ▴ 10

1

Entering edit mode

If I understood correctly, you have peak position and I guess you need the distance from TSS.

Let's assume you have peak information in bed file format: chromosome peakStart peakEnd, then you can annotate them using R package such as ChIPpeakAnno OR you can manually calculate the distance from TSS using gtf file (in case of a non-model organism).

Finally, you can use distance from TSS information from peak annotation and calculate the frequency.

ADD REPLY • link 3.0 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Thanks for you answer. I have already calculate each peak position relative from the TSS, so the postitions I had were the start position relative to TSS and end position relative to TSS.
How do I calculate the frequency, The frequencey is the number of gene having a given peak? For example :

so the frequency (column 3) is

10 20 2 
30 40 2
50 60 1

ADD REPLY • link 3.0 years ago by Sarah ▴ 10

0

Entering edit mode

Hmmmmmm...!!

I am not sure how did you calculate the relative distance from TSS for the peaks. But I assume the second and third column is the distance of start and end from TSS, and then take each column and calculate the frequency separately.

table(data$start)
table(data$end)

ADD REPLY • link 3.0 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

dplyr's group_by + count are your friends:

library(tidyverse)
test <- tibble(
  gene = c(1,1,2,2,2), 
  peak_ldist_tss = c(10, 30, 10, 30, 50), 
  peak_rdist_tss = c(20, 40, 20, 40, 60))
test %>% 
  group_by(peak_ldist_tss, peak_rdist_tss) %>%
  count()

ADD REPLY • link 3.0 years ago by Carambakaracho ★ 3.2k