Question: bedtools intersect with "bedgraph" style with multiple extra columns of data for multiple samples?
0
gravatar for dansapozhnikov
7 months ago by
dansapozhnikov10 wrote:

Hi,

I can successfully perform bedtools intersect for two 3-column BED files, but I want to perform the same intersect using a "file A" that contains multiple additional columns with values for a large number of samples formatted like so:

File A:

(chr) (start) (stop) (sample 1..2..3...4...etc..)

chr1   1001   1002    10    25   14   25
chr1   1006   1007    12    22   11   50
chr1   1012   1013    44    11   12   30

File B:

chr1   500    1010
chr1   2000   3000

Output:

chr1   1001   1002    10    25   14   25
chr1   1006   1007    12    22   11   50

Does anyone know if there is a way to use bedtools intersect, or bedtools intersect with another program, or any other code at all into perform this kind of intersect and retain the data in the additional columns in the output?

Thanks!

Dan

awk intersect bedtools • 465 views
ADD COMMENTlink modified 7 months ago by Alex Reynolds27k • written 7 months ago by dansapozhnikov10

Have a look at bedtools unionbedg.

ADD REPLYlink written 7 months ago by ATpoint14k

Actually, the File A is an output of unionbedg. So it is a bedgraph with many columns and then I want to intersect it with a BED file with intervals. (Also File A has data for each single nucleotide whereas the BED file has intervals)

I added an example to the original post.

ADD REPLYlink modified 7 months ago • written 7 months ago by dansapozhnikov10
1
gravatar for Alex Reynolds
7 months ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

Not sure if/how your files are sorted, but this should take care of that:

$ bedops -e 1 <(sort-bed fileA.bed) <(sort-bed fileB.bed) > answer.bed
ADD COMMENTlink modified 7 months ago • written 7 months ago by Alex Reynolds27k

This does not work for me for some reason. It says "Non-numeric end coordinate. See line 1 in fileB.bed". This is not even the problematic file and it worked in bedtools intersect.

ADD REPLYlink written 7 months ago by dansapozhnikov10

If you run cat -te on a few lines of your files, what does it say? For instance what comes out of: head -5 foo.bed | cat -te? Something's up with your files, which needs fixing.

ADD REPLYlink modified 7 months ago • written 7 months ago by Alex Reynolds27k

chr1^I2572970^I2579715^M$

ADD REPLYlink written 7 months ago by dansapozhnikov10

See that ^M at the end? That's a Windows carriage return character. You need to remove that. You can use tr for this:

$ tr -d '\r' < foo.bed > foo.fixed.bed

Repeat for all afflicted files, then run your commands on the fixed files.

ADD REPLYlink written 7 months ago by Alex Reynolds27k

OK so this seems to be a Windows line ending, so I ran dos2unix. Then I ran bedops as before, and now (unlike bedtools intersect) it ran, but it does not retain the extra columns.

ADD REPLYlink written 7 months ago by dansapozhnikov10

What is the output of this:

$ head -5 fileA.fixed.bed | cat -te

What is the output of this:

$ head -5 fileB.fixed.bed | cat -te

Please post everything you see. The bedops -e command just reports back any elements as they are found, and does not modify them. So either your files are not structured as described, or there is some other problem. If we can see your actual inputs, we can probably figure out what's up.

ADD REPLYlink modified 7 months ago • written 7 months ago by Alex Reynolds27k

Thanks to your suggestions I found the problem! File A was space-delimited, not tab. It now runs and retains all the columns. Thank you very much for your help.

ADD REPLYlink written 7 months ago by dansapozhnikov10
1

Awesome! Glad to help.

ADD REPLYlink written 7 months ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1006 users visited in the last hour