How to plot chipseq peak relative to TSS for some genes ?
0
0
Entering edit mode
5 weeks ago
Sarah ▴ 10

I have a file with with start and end peak position according to TSS. My file contains 4 coumns (gene, start_peak, end_peak, strand). I would create a distribution plot. On the x-axis is the distance from the TSS and On the y-axis is number of peak found at that distance. I want to bin the distances from the TSS, say every 500 bp.nIt is not read density but Chipseq peak density How to plot ChIP-seq Density vs Distance from TSS using my file and R? How to use these start and end peak position, because when I plot this ggplot(data, aes(x=start_peak)) + geom_density() it is only according to start peak ! And why my data are not centered on the TSS ? Could you help me ? Thanks

chipseq R plotwithR TSS peak • 216 views
1
Entering edit mode

If I understood correctly, you have peak position and I guess you need the distance from TSS.

Let's assume you have peak information in bed file format: chromosome peakStart peakEnd, then you can annotate them using R package such as ChIPpeakAnno OR you can manually calculate the distance from TSS using gtf file (in case of a non-model organism).

Finally, you can use distance from TSS information from peak annotation and calculate the frequency.

0
Entering edit mode

Thanks for you answer. I have already calculate each peak position relative from the TSS, so the postitions I had were the start position relative to TSS and end position relative to TSS.
How do I calculate the frequency, The frequencey is the number of gene having a given peak? For example :

1 10 20
1 30 40
2 10 20
2 30 40
2 50 60


so the frequency (column 3) is

10 20 2
30 40 2
50 60 1

0
Entering edit mode

Hmmmmmm...!!

I am not sure how did you calculate the relative distance from TSS for the peaks. But I assume the second and third column is the distance of start and end from TSS, and then take each column and calculate the frequency separately.

table(data$start) table(data$end)

0
Entering edit mode

dplyr's group_by + count are your friends:

library(tidyverse)
test <- tibble(
gene = c(1,1,2,2,2),
peak_ldist_tss = c(10, 30, 10, 30, 50),
peak_rdist_tss = c(20, 40, 20, 40, 60))
test %>%
group_by(peak_ldist_tss, peak_rdist_tss) %>%
count()