Find overlaping sequences with pyranges from overlap
0
0
Entering edit mode
2.6 years ago
McClain • 0

I am trying to replicate the mergeByOverlap function from R BioConductor in python using the pyranges package. In R the code would be:

gr.snp <- with(gr.snp, GRanges(chr, IRanges(start, end),rsid=gr.snp$rsid))
snp.annotated <- data.frame(mergeByOverlaps(gr.snp, gencode, maxgap=2000, type="start"))

which returns:

nrow(snp.annotated)
[1] 34

colnames(snp.annotated)
[1] "gr.snp.seqnames"                  "gr.snp.start"                    
 [3] "gr.snp.end"                       "gr.snp.width"                    
 [5] "gr.snp.strand"                    "gr.snp.rsid"                     
 [7] "rsid"                             "gencode.seqnames"                
 [9] "gencode.start"                    "gencode.end"                     
[11] "gencode.width"                    "gencode.strand"                  
[13] "gencode.source"                   "gencode.type"                    
[15] "gencode.score"                    "gencode.phase"                   
[17] "gencode.ID"                       "gencode.gene_id"                 
[19] "gencode.gene_type"                "gencode.gene_name"               
[21] "gencode.level"                    "gencode.hgnc_id"                 
[23] "gencode.havana_gene"              "gencode.Parent"                  
[25] "gencode.transcript_id"            "gencode.transcript_type"         
[27] "gencode.transcript_name"          "gencode.transcript_support_level"
[29] "gencode.tag"                      "gencode.havana_transcript"       
[31] "gencode.exon_number"              "gencode.exon_id"                 
[33] "gencode.ont"                      "gencode.protein_id"

Where gr.snp is my snp file that I want the annotations for and gencode is the annotation file.

The closest I've gotten with python is with cluster or merge but they arent exactly right:

>>> temp = gr_snp.cluster(genecode_genes, slack = 2000)
>>> len(temp)
469
>>> temp.columns
Index(['Chromosome', 'Start', 'End', 'rsid', 'Strand', 'Cluster'], dtype='object')

Merge does basically the same thing but preserves even less metadata. I need the metadata from both tables to be preserved. Does anyone know how to do this?

mergebyoverlap python overlap r pyranges • 566 views
ADD COMMENT

Login before adding your answer.

Traffic: 2670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6