Question: Locate positions in a large bed file
0
gravatar for DVA
2.2 years ago by
DVA530
United States
DVA530 wrote:

I have a ~20G bed file for a few samples, and the bed file contains read depth for all positions of the human genome. Also, I have a list of chromosome positions (~30,000 to 50,000) for each sample, and I would like to know the read depth at these positions. What would be the most efficient method please?

I'm currently thinking to split my large bed file and the list per chromosome or block (via grep or BEDOPS), and work from there, but I would love to know better ways to do this. Thank you very much.

bed • 982 views
ADD COMMENTlink modified 2.2 years ago by Alex Reynolds29k • written 2.2 years ago by DVA530
2
gravatar for Alex Reynolds
2.2 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

You don't need to split the file.

Sort your BED file with BEDOPS sort-bed, if unsorted:

$ sort-bed reads.unsorted.bed > reads.bed

Then pass the chromosome name to bedops or bedmap or closest-features via --chrom <chromosome> to do work only on that chromosome:

$ bedops --chrom chrN --operations ...

Or:

$ bedmap --chrom chrN --operations ...

Etc.

If you want a list of all operations, take a look at the BEDOPS documentation.

If you want a fast list of chromosomes:

$ bedextract --list-chr reads.bed

You can pipe this list to a script loop, to do work over each chromosome on a computational cluster. BEDOPS enables parallelization by chromosome pretty easily.

These operations are very fast, because this uses the sort order in sort-bed sorted BED files to do a binary search that determines chromosome bounds. Operations that take minutes or hours can go down to seconds.

If your reads are single-end, you should be able to add --faster to BEDOPS commands to make them use similar techniques to speed operations up even further.

Paired-end reads can result in "nested" elements that prevent the use of --faster without some pre-processing tricks. Your data is on the multi-GB scale such that, if you are working with paired-end reads, you may be interested in investigating these tricks if you're doing these operations repeatedly. See the "extra work" note in the bedextract documentation.

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Alex Reynolds29k

Thank you very much for the detailed information. The list of positions I'm talking about is no longer at reads level - they are just chromosome & positions. It looks like I should turn this list into another bed file, and just use "6.1.2.3.3. Retrieving elements which overlap target elements" and I'm going to give it a try.

ADD REPLYlink written 2.2 years ago by DVA530
1
gravatar for genomax
2.2 years ago by
genomax75k
United States
genomax75k wrote:

BEDOPS should have a function equivalent to intesectbed from Bedtools.

ADD COMMENTlink written 2.2 years ago by genomax75k

Thank you so much for your reply. I am looking at this too - do you happen to know how fast these two functions are for large files?

ADD REPLYlink written 2.2 years ago by DVA530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1887 users visited in the last hour