Question: Newbie Question: How to find overlap of multiple cellline datas
0
gravatar for skim
15 months ago by
skim20
skim20 wrote:

Hello. I have a bed file with about 90 MBs and I need to find the overlaps between multiple bed files (sums to about 800MB) each containing sequences using Python. I have enough processing power but I need to simpify this process. I suspected using an interval tree was a good choice and found this: https://pypi.python.org/pypi/intervaltree_bio but I could not get further.

I have about 60 cell line datas each with a BED file about 6-10 MBs. I have a directory containing directories of the names of .bed files and .pk (peak files) and each of these directories have one bed file. Cell line Overlaps

Is it possible for anyone to give me a specific advice on how to do this task? Thank you very much.

Main .bed file example queries:

chr20 30053341 30053368 DEFB124 70.6955419 +

chr20 30053397 30053424 DEFB124 63.90851928 +

.pk cellline file example queries:

chr1 713835 714424 chr1.1 1000 . 0.1621 10.6 -1 253

chr1 752775 753050 chr1.2 567 . 0.0365 2.09 -1 124

.bed cellline file example queries

chr1 91425 91575 id-4576 9

chr1 714005 714155 id-35705 186.000000

ngs crispr overlap • 507 views
ADD COMMENTlink modified 4 months ago by Biostar ♦♦ 20 • written 15 months ago by skim20
1

Is it possible for anyone to give me a specific advice on how to do this task?

betools intersect

ADD REPLYlink modified 15 months ago • written 15 months ago by Pierre Lindenbaum114k

Thank you.... A very short yet powerful reply So I just have to use pybedtools and Python to search the files and feed to this: https://daler.github.io/pybedtools/autodocs/pybedtools.bedtool.BedTool.intersect.html ??

ADD REPLYlink written 15 months ago by skim20

Can you give an example of what kind of query you are trying to do between input BED files? Are you trying to find all elements that are mutually overlapping in a set of N BED files, for instance? A straight-up intersection will not work here, in that case, because of overlaps within an input, etc. so a more sophisticated approach is needed there.

ADD REPLYlink written 4 months ago by Alex Reynolds26k

Thank you for your answer, but I finished this task 9 months ago :)

ADD REPLYlink written 4 months ago by skim20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1308 users visited in the last hour