Question: bedtools intersect with "bedgraph" style with multiple extra columns of data for multiple samples?
0
gravatar for dansapozhnikov
11 months ago by
dansapozhnikov10 wrote:

Hi,

I can successfully perform bedtools intersect for two 3-column BED files, but I want to perform the same intersect using a "file A" that contains multiple additional columns with values for a large number of samples formatted like so:

File A:

(chr) (start) (stop) (sample 1..2..3...4...etc..)

chr1   1001   1002    10    25   14   25
chr1   1006   1007    12    22   11   50
chr1   1012   1013    44    11   12   30

File B:

chr1   500    1010
chr1   2000   3000

Output:

chr1   1001   1002    10    25   14   25
chr1   1006   1007    12    22   11   50

Does anyone know if there is a way to use bedtools intersect, or bedtools intersect with another program, or any other code at all into perform this kind of intersect and retain the data in the additional columns in the output?

Thanks!

Dan

awk intersect bedtools • 605 views
ADD COMMENTlink modified 11 months ago by Alex Reynolds28k • written 11 months ago by dansapozhnikov10

Have a look at bedtools unionbedg.

ADD REPLYlink written 11 months ago by ATpoint19k

Actually, the File A is an output of unionbedg. So it is a bedgraph with many columns and then I want to intersect it with a BED file with intervals. (Also File A has data for each single nucleotide whereas the BED file has intervals)

I added an example to the original post.

ADD REPLYlink modified 11 months ago • written 11 months ago by dansapozhnikov10
1
gravatar for Alex Reynolds
11 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Not sure if/how your files are sorted, but this should take care of that:

$ bedops -e 1 <(sort-bed fileA.bed) <(sort-bed fileB.bed) > answer.bed
ADD COMMENTlink modified 11 months ago • written 11 months ago by Alex Reynolds28k

This does not work for me for some reason. It says "Non-numeric end coordinate. See line 1 in fileB.bed". This is not even the problematic file and it worked in bedtools intersect.

ADD REPLYlink written 11 months ago by dansapozhnikov10

If you run cat -te on a few lines of your files, what does it say? For instance what comes out of: head -5 foo.bed | cat -te? Something's up with your files, which needs fixing.

ADD REPLYlink modified 11 months ago • written 11 months ago by Alex Reynolds28k

chr1^I2572970^I2579715^M$

ADD REPLYlink written 11 months ago by dansapozhnikov10

See that ^M at the end? That's a Windows carriage return character. You need to remove that. You can use tr for this:

$ tr -d '\r' < foo.bed > foo.fixed.bed

Repeat for all afflicted files, then run your commands on the fixed files.

ADD REPLYlink written 11 months ago by Alex Reynolds28k

OK so this seems to be a Windows line ending, so I ran dos2unix. Then I ran bedops as before, and now (unlike bedtools intersect) it ran, but it does not retain the extra columns.

ADD REPLYlink written 11 months ago by dansapozhnikov10

What is the output of this:

$ head -5 fileA.fixed.bed | cat -te

What is the output of this:

$ head -5 fileB.fixed.bed | cat -te

Please post everything you see. The bedops -e command just reports back any elements as they are found, and does not modify them. So either your files are not structured as described, or there is some other problem. If we can see your actual inputs, we can probably figure out what's up.

ADD REPLYlink modified 11 months ago • written 11 months ago by Alex Reynolds28k

Thanks to your suggestions I found the problem! File A was space-delimited, not tab. It now runs and retains all the columns. Thank you very much for your help.

ADD REPLYlink written 11 months ago by dansapozhnikov10
1

Awesome! Glad to help.

ADD REPLYlink written 11 months ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 787 users visited in the last hour