Question: Handling Ranged Data In Python
gravatar for Abhi
7.3 years ago by
United States
Abhi1.5k wrote:

Hi Guys

I am wondering if there is something that I could use to handle ranged data in python similar to GenomicRanges/IRanges in R ?

A specific example:

My input data is gene coordinates and chromosomes. For each gene I would like to know the end coordinate of the gene just before it to calculate the intergenic distance.

Thanks! -Abhi

python biopython R • 2.8k views
ADD COMMENTlink modified 14 months ago by endrebak770 • written 7.3 years ago by Abhi1.5k

There are several python packages and other alternatives mentioned here:

ADD REPLYlink written 7.3 years ago by Michael Dondrup46k

I remember that package was called 'pygr', but I haven't used it myself.

ADD REPLYlink written 7.3 years ago by Michael Dondrup46k

I think it would be better if you paste a little portion of your input file on your question.

ADD REPLYlink written 7.3 years ago by Geparada1.4k

Hi Abhi. Can you post your input file?

ADD REPLYlink written 7.3 years ago by Tommy Carstensen40
gravatar for Weronika
7.3 years ago by
Weronika300 wrote:

Michael Dondrup's comment is probably the best solution if you want to do anything complex with ranges, but for the problem stated (intergenic distance calculation), you really only need a few lines of python code:

# I don't know what your data looks like, so I'll start with a list of
#  name,chromosome,start_pos,end_pos tuples
gene_positions = [('geneA', 'chr1', 100,200), ('geneB', 'chr1', 300,400), 
                  ('geneC', 'chr1', 401, 450), ('geneD', 'chr2', 100,200)]
# sorting genes by the chromosome and start_position fields so they're in order
gene_positions.sort(key = lambda x: (x[1],x[2]))

# start with intergenic distances as an empty list
intergenic_distances = []

# using zip to get a list of pairs of adjacent genes
for gene1,gene2 in zip(gene_positions, gene_positions[1:]):
    (gene1_name,gene1_chr,gene1_start,gene1_end) = gene1
    (gene2_name,gene2_chr,gene2_start,gene2_end) = gene2
    # only take gene pairs on the same chromosome
    if gene1_chr == gene2_chr:
        # add the distance and both gene names to intergenic_distance 
        #  (may need to adjust the distance math depending 
        #   if the positions are end-inclusive)
        intergenic_distances.append((gene2_start - gene1_end, 
                                     gene1_name, gene2_name))

I don't know what you want to do with the data afterward, but you get a list of (intergenic_distance, gene1_name, gene2_name) tuples, like this:

for distance,gene1,gene2 in intergenic_distances:
  print "Distance between %s and %s is %s"%(gene1,gene2,distance)
# For the example data above, this prints the following:
#  Distance between geneA and geneB is 100
#  Distance between geneB and geneC is 1

Of course, you probably already knew that.

ADD COMMENTlink written 7.3 years ago by Weronika300
gravatar for endrebak
14 months ago by
endrebak770 wrote:

You now can:

It is still in pre-alpha, but will be updated by me as I use it all the time.

Edit: in beta now. To my knowledge it is faster and more widely tested than any other alternative.

ADD COMMENTlink modified 3 months ago • written 14 months ago by endrebak770
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2136 users visited in the last hour