Question: pythonic equivalent to reduce() in R GRanges - how to collapse ranged data?
2
gravatar for christopher medway
23 months ago by
Cardiff, UK
christopher medway440 wrote:

I posted this question on stackoverflow, but it did not get a response. It is a bioinformatics query, so perhaps this is a better forum: Is there are pythonic equivalent to the R ranges operation below?

In R (albeit longwinded):

Here is a test data.frame

df <- data.frame(
  "CHR" = c(1,1,1,2,2),
  "START" = c(100, 200, 300, 100, 400),
  "STOP" = c(150,350,400,500,450)
  )

First I make GRanges object:

gr <- GenomicRanges::GRanges(
  seqnames = df$CHR,
  ranges = IRanges(start = df$START, end = df$STOP)
  )

Then I reduce the intervals to collapse into new granges object:

reduced <- reduce(gr)

Now append a new column to original dataframe which confirms which rows belong to the same contiguous 'chunk'.

subjectHits(findOverlaps(gr, reduced))

Output:

> df
  CHR START STOP locus
1   1   100  150     1
2   1   200  350     2
3   1   300  400     2
4   2   100  500     3
5   2   400  450     3

How do I do this in Python?

python R • 671 views
ADD COMMENTlink modified 9 months ago by endrebak800 • written 23 months ago by christopher medway440
1

in what structure is your data stored in python?

ADD REPLYlink written 23 months ago by Martombo2.6k

The data is stored as a CSV to disk. As a python newbie, I guess I would load as a pandas table, but I am open to suggestions.

ADD REPLYlink written 23 months ago by christopher medway440
4
gravatar for endrebak
9 months ago by
endrebak800
github.com/endrebak
endrebak800 wrote:

https://github.com/biocore-ntnu/pyranges

import pyranges as pr
chromosomes = [1] * 3 + [2] * 2
starts = [100, 200, 300, 100, 400]
ends = [150, 350, 400, 500, 450]
gr = pr.PyRanges(chromosomes=chromosomes, starts=starts, ends=ends)
print(gr.cluster())
# +--------------+-----------+-----------+-----------+
# |   Chromosome |     Start |       End |   Cluster |
# |       (int8) |   (int32) |   (int32) |   (int64) |
# |--------------+-----------+-----------+-----------|
# |            1 |       100 |       150 |         1 |
# |            1 |       200 |       350 |         2 |
# |            1 |       300 |       400 |         2 |
# |            2 |       100 |       500 |         3 |
# |            2 |       400 |       450 |         3 |
# +--------------+-----------+-----------+-----------+

It will be out in 0.0.21. Thanks for the idea!

ADD COMMENTlink written 9 months ago by endrebak800
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 772 users visited in the last hour