Question: bedtools complement error
0
gravatar for bk11
9 days ago by
bk1130
bk1130 wrote:

Hi I have an error from bedtools. What might be happening? I have two bed files:

cat A.bed
chr1  100  200
chr1  400  500
chr1  500  800

cat my.genome
chr1  1000
chr2  800

when I run this:

bedtools complement -i A.bed -g your.genome

It gives

Error: The genome file your.genome has no valid entries. Exiting.
bedtools • 119 views
ADD COMMENTlink modified 9 days ago by Alex Reynolds26k • written 9 days ago by bk1130
0
gravatar for ATpoint
9 days ago by
ATpoint11k
Germany
ATpoint11k wrote:

your.genome must be tab-delimited.

ADD COMMENTlink written 9 days ago by ATpoint11k

I changed it into tab-delimited and still does not work.

sed 's/ /\t/g' my.genome >my.genome1
cat my.genome1

chr1        1000
chr2        800
ADD REPLYlink written 9 days ago by bk1130

If your files were tab-delimited, it would work. You probably substituted the wrong delimiter in your sedcommand. Probably it is a double-whitespace or something, and after your command you now have a hybrid tab-whitespace delimiter.

enter image description here

ADD REPLYlink modified 9 days ago • written 9 days ago by ATpoint11k

Could you please show your command lines who you generated bed files? I am still having problem.

ADD REPLYlink written 9 days ago by bk1130

In this case I simply did it manually by tiping it in a text editor. What organism are you working on? There are genome.sizes files available for download for most species.

ADD REPLYlink modified 9 days ago • written 9 days ago by ATpoint11k

Try replacing all [[:space:]]+ with \t. That should work.

ADD REPLYlink written 9 days ago by RamRS19k
0
gravatar for Alex Reynolds
9 days ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

Here's a one-liner that should work:

$ bedops --complement <( sort-bed A.bed ) <( awk -v OFS="\t" '{ print $1, "0", $2 }' my.genome | sort-bed - )  > answer.bed

This part is called a process substitution in the bash shell:

... <( awk -v OFS="\t" '{ print $1, "0", $2 }' my.genome | sort-bed - ) ...

It uses awk to turn the file my.genome into a sorted BED file, on which you can do set operations with bedops. Basically, everything within <( ... ) returns operational intervals that are fed to the bedops process as a standard input stream.

Here's what the one-liner looks like when broken down into separate commands:

$ sort-bed A.bed > A.sorted.bed
$ awk -v OFS="\t" '{ print $1, "0", $2 }' my.genome | sort-bed - > my.genome.sorted.bed
$ bedops --complement A.sorted.bed my.genome.sorted.bed > answer.bed
$ rm A.sorted.bed my.genome.sorted.bed

Process substitutions might look a little odd, at first, but they help avoid creating intermediate files, which slow down operations on whole-genome scale work. Intermediate files also require disk space and need cleaning up. It's useful to avoid intermediate files, when possible.

ADD COMMENTlink modified 9 days ago by ATpoint11k • written 9 days ago by Alex Reynolds26k

I added a whitespace between all awk -v and OFS=. Hope you don't mind :)

ADD REPLYlink written 9 days ago by ATpoint11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1713 users visited in the last hour