How to plot chipseq peak relative to TSS for some genes ?
0
0
Entering edit mode
2.9 years ago
Sarah ▴ 10

I have a file with with start and end peak position according to TSS. My file contains 4 coumns (gene, start_peak, end_peak, strand). I would create a distribution plot. On the x-axis is the distance from the TSS and On the y-axis is number of peak found at that distance. I want to bin the distances from the TSS, say every 500 bp.nIt is not read density but Chipseq peak density How to plot ChIP-seq Density vs Distance from TSS using my file and R? How to use these start and end peak position, because when I plot this ggplot(data, aes(x=start_peak)) + geom_density() it is only according to start peak ! And why my data are not centered on the TSS ? Could you help me ? Thanks

chipseq R plotwithR TSS peak • 1.8k views
ADD COMMENT
1
Entering edit mode

If I understood correctly, you have peak position and I guess you need the distance from TSS.

Let's assume you have peak information in bed file format: chromosome peakStart peakEnd, then you can annotate them using R package such as ChIPpeakAnno OR you can manually calculate the distance from TSS using gtf file (in case of a non-model organism).

Finally, you can use distance from TSS information from peak annotation and calculate the frequency.

ADD REPLY
0
Entering edit mode

Thanks for you answer. I have already calculate each peak position relative from the TSS, so the postitions I had were the start position relative to TSS and end position relative to TSS.
How do I calculate the frequency, The frequencey is the number of gene having a given peak? For example :

1 10 20
1 30 40
2 10 20
2 30 40
2 50 60

so the frequency (column 3) is

10 20 2 
30 40 2
50 60 1
ADD REPLY
0
Entering edit mode

Hmmmmmm...!!

I am not sure how did you calculate the relative distance from TSS for the peaks. But I assume the second and third column is the distance of start and end from TSS, and then take each column and calculate the frequency separately.

table(data$start)
table(data$end)
ADD REPLY
0
Entering edit mode

dplyr's group_by + count are your friends:

library(tidyverse)
test <- tibble(
  gene = c(1,1,2,2,2), 
  peak_ldist_tss = c(10, 30, 10, 30, 50), 
  peak_rdist_tss = c(20, 40, 20, 40, 60))
test %>% 
  group_by(peak_ldist_tss, peak_rdist_tss) %>%
  count()
ADD REPLY

Login before adding your answer.

Traffic: 2274 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6