Question

determine overlaps of chromosome coordinates from two different files

0

Entering edit mode

6.4 years ago

mittu1602 ▴ 200

I have two bed files:

file1: (since its a compilation of 50 files the header names are also needed for reference)

==> /home/blade/1-processedData/sample.coverageExon.csv <==

space   start   end V4  avgCoverage
chr10   133525740   133525741   CYP2E1  475
chr10   133527062   133527065   CYP2E1  402
chr10   133527323   133527328   CYP2E1  441
chr10   133531206   133531209   CYP2E1  104
chr10   133534752   133534755   CYP2E1  395
chr10   133535862   133535865   CYP2E1  278
chr10   133537632   133537635   CYP2E1  0
chr10   43572708    43572779    RET 789

file2:

chr10   37868205    37868205
chr10   37880220    37880220
chr10   37880220    37880233
chr10   37880261    37880261
chr10   37880261    37880261
chr10   37881000    37881000
chr10   37881003    37881011
chr10   37881332    37881332
chr10   37881616    37881616
chr6    152415537   152415537

I want to intersect file2 on file1, below is my expected output:

==> /home/blade/1-processedData/sample.coverageExon.csv <== 

chr10   37868205    37868205    CYP2E1  402
chr10   37880220    37880220    CYP2E1  441
chr10   37880220    37880233    CYP2E1  104
chr10   37880261    37880261    CYP2E1  395
chr10   37880261    37880261    CYP2E1  278
chr10   37881000    37881000    CYP2E1  0
chr10   37881003    37881011    RET 789

I have already tried bedtools intersect but since the headers are included in the file it is not able to read the file. Is there any other way of doing it

bed intersetct • 1.4k views

ADD COMMENT • link updated 6.4 years ago by Alex Reynolds 35k • written 6.4 years ago by mittu1602 ▴ 200

1

Entering edit mode

since the headers are included in the file

remove the headers...

grep -v '^space' file1a.bed | sort -t $'\t' -k1,1 -k2,2n > file1b.bed

ADD REPLY • link 6.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

sorry I need those headers, as mentioned in the question: (since its a compilation of 50 files the header names are also needed for reference)

ADD REPLY • link 6.4 years ago by mittu1602 ▴ 200

0

Entering edit mode

this is basic linux , you can always add the header later (!)

echo -e "space\tstart\tend\tV4\tavgCoverage\tchrom2\tstart2\tend2"  > result2.bed 
cat result.bed >> result2.bed

ADD REPLY • link 6.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I am aware of extracting headers and later adding it, but there are 50 headers which are in the middle of the file some thing like this:

===> file1 <===
chr10 25689 25698 
chr1 256987 569846
===>file2<===
chr6 78965 789577

and so on..! it will be difficult to remove headers for 50-100 files.

ADD REPLY • link 6.4 years ago by mittu1602 ▴ 200

1

Entering edit mode

If you do not remove the headers, these files are not in a format that any existing tools use out-of-the-box. You can always write some custom scripts to process the data any way you like.

ADD REPLY • link 6.4 years ago by Sean Davis 26k

0

Entering edit mode

I don't understand where are the headers in your example. It looks like a head command on multiple files.

ADD REPLY • link 6.4 years ago by Pierre Lindenbaum 161k

score 2 · Answer 1 · 2017-11-08

2

Entering edit mode

6.4 years ago

Alex Reynolds 35k

Remove the headers and then use BEDOPS bedmap:

$ bedmap --echo --echo-map-id-uniq file2 file1 > answer.bed

ADD COMMENT • link 6.4 years ago by Alex Reynolds 35k