Question: Handling Ranged Data In Python
3
gravatar for Abhi
6.6 years ago by
Abhi1.5k
United States
Abhi1.5k wrote:

Hi Guys

I am wondering if there is something that I could use to handle ranged data in python similar to GenomicRanges/IRanges in R ?

A specific example:

My input data is gene coordinates and chromosomes. For each gene I would like to know the end coordinate of the gene just before it to calculate the intergenic distance.

Thanks! -Abhi

python biopython R • 2.4k views
ADD COMMENTlink modified 5 months ago by endrebak680 • written 6.6 years ago by Abhi1.5k
4

There are several python packages and other alternatives mentioned here: http://biostar.stackexchange.com/questions/2245/what-is-the-quickest-algorithm-for-range-overlap

ADD REPLYlink written 6.6 years ago by Michael Dondrup44k

I remember that package was called 'pygr', but I haven't used it myself.

ADD REPLYlink written 6.6 years ago by Michael Dondrup44k

I think it would be better if you paste a little portion of your input file on your question.

ADD REPLYlink written 6.6 years ago by Geparada1.3k

Hi Abhi. Can you post your input file?

ADD REPLYlink written 6.6 years ago by Tommy Carstensen40
4
gravatar for Weronika
6.6 years ago by
Weronika300
Stanford
Weronika300 wrote:

Michael Dondrup's comment is probably the best solution if you want to do anything complex with ranges, but for the problem stated (intergenic distance calculation), you really only need a few lines of python code:

# I don't know what your data looks like, so I'll start with a list of
#  name,chromosome,start_pos,end_pos tuples
gene_positions = [('geneA', 'chr1', 100,200), ('geneB', 'chr1', 300,400), 
                  ('geneC', 'chr1', 401, 450), ('geneD', 'chr2', 100,200)]
# sorting genes by the chromosome and start_position fields so they're in order
gene_positions.sort(key = lambda x: (x[1],x[2]))

# start with intergenic distances as an empty list
intergenic_distances = []

# using zip to get a list of pairs of adjacent genes
for gene1,gene2 in zip(gene_positions, gene_positions[1:]):
    (gene1_name,gene1_chr,gene1_start,gene1_end) = gene1
    (gene2_name,gene2_chr,gene2_start,gene2_end) = gene2
    # only take gene pairs on the same chromosome
    if gene1_chr == gene2_chr:
        # add the distance and both gene names to intergenic_distance 
        #  (may need to adjust the distance math depending 
        #   if the positions are end-inclusive)
        intergenic_distances.append((gene2_start - gene1_end, 
                                     gene1_name, gene2_name))

I don't know what you want to do with the data afterward, but you get a list of (intergenic_distance, gene1_name, gene2_name) tuples, like this:

for distance,gene1,gene2 in intergenic_distances:
  print "Distance between %s and %s is %s"%(gene1,gene2,distance)
# For the example data above, this prints the following:
#  Distance between geneA and geneB is 100
#  Distance between geneB and geneC is 1

Of course, you probably already knew that.

ADD COMMENTlink written 6.6 years ago by Weronika300
1
gravatar for endrebak
5 months ago by
endrebak680
endrebak680 wrote:

You now can: https://github.com/endrebak/pyranges

It is still in pre-alpha, but will be updated by me as I use it all the time.

ADD COMMENTlink written 5 months ago by endrebak680
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1986 users visited in the last hour