calculating overlaps with Peak calls for each exon
0
0
Entering edit mode
18 months ago
pt.taklifi ▴ 60

I have a list of exons and a list of peak Calls in .txt format

Exons

 structure(list(chr1 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "chr1", class = "factor"),
X3857280 = c(3858717L, 3865811L, 3867973L, 3869604L, 3872471L
), X3857717 = c(3858844L, 3866000L, 3868053L, 3869775L, 3872572L
), ENST00000378209.7_exon_0_0_chr1_3857281_f = structure(1:5, .Label = c("ENST00000378209.7_exon_1_0_chr1_3858718_f",
"ENST00000378209.7_exon_2_0_chr1_3865812_f", "ENST00000378209.7_exon_3_0_chr1_3867974_f",
"ENST00000378209.7_exon_4_0_chr1_3869605_f", "ENST00000378209.7_exon_5_0_chr1_3872472_f"
), class = "factor"), X0 = c(0L, 0L, 0L, 0L, 0L), X. = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "+", class = "factor")), class = "data.frame", row.names = c(NA,
-5L))


Peak Calls

structure(list(seqnames = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "chr1", class = "factor"),
start = c(975451L, 1014228L, 1290080L, 1291099L, 1291742L,
1327977L), end = c(975952L, 1014729L, 1290581L, 1291600L,
1292243L, 1328478L), name = structure(c(5L, 6L, 1L, 2L, 3L,
4L), .Label = c("BRCA_123", "BRCA_124", "BRCA_125", "BRCA_143",
"BRCA_39", "BRCA_55"), class = "factor"), score = c(1.87842575038562,
4.07469686212787, 2.44358820293876, 3.18019908767794, 8.26783029566134,
1.08246502080444), annotation = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = "3' UTR", class = "factor"), percentGC = c(0.6187624750499,
0.62874251497006, 0.678642714570858, 0.702594810379242, 0.640718562874252,
0.676646706586826), percentAT = c(0.3812375249501, 0.37125748502994,
0.321357285429142, 0.297405189620758, 0.359281437125749,
0.323353293413174)), class = "data.frame", row.names = c(NA,
-6L))


so I for each exon I want to calculate if it overlaps with any of the peaks and if it does what percentage of exon is overlapping the peak AND if an exon overlaps more than one peak I want to report that then I want to store the results in a new table or data frame. other than a for loop I can't think of anything. specially since my data is rather big I'm looking for an efficient code. I'm currently working with R but I can do some coding in ubuntu terminal as well

R overlap bioconductor exon • 387 views
0
Entering edit mode

Fyi, if you have data in R and want to share in in an easy copy/paste fashion then use dput() on the object. It will create ASCII representation of the data that you can share here so users can quickly have your example data rather than typing them in. Use can use edit to add content to your post.

1
Entering edit mode

Ok thanks for advice . I converted my data to ASCII format .