Question: determine overlaps of chromosome coordinates from two different files
0
gravatar for mittu1602
20 months ago by
mittu1602160
India
mittu1602160 wrote:

I have two bed files:

file1: (since its a compilation of 50 files the header names are also needed for reference)

==> /home/blade/1-processedData/sample.coverageExon.csv <==

space   start   end V4  avgCoverage
chr10   133525740   133525741   CYP2E1  475
chr10   133527062   133527065   CYP2E1  402
chr10   133527323   133527328   CYP2E1  441
chr10   133531206   133531209   CYP2E1  104
chr10   133534752   133534755   CYP2E1  395
chr10   133535862   133535865   CYP2E1  278
chr10   133537632   133537635   CYP2E1  0
chr10   43572708    43572779    RET 789

file2:

chr10   37868205    37868205
chr10   37880220    37880220
chr10   37880220    37880233
chr10   37880261    37880261
chr10   37880261    37880261
chr10   37881000    37881000
chr10   37881003    37881011
chr10   37881332    37881332
chr10   37881616    37881616
chr6    152415537   152415537

I want to intersect file2 on file1, below is my expected output:

==> /home/blade/1-processedData/sample.coverageExon.csv <== 

chr10   37868205    37868205    CYP2E1  402
chr10   37880220    37880220    CYP2E1  441
chr10   37880220    37880233    CYP2E1  104
chr10   37880261    37880261    CYP2E1  395
chr10   37880261    37880261    CYP2E1  278
chr10   37881000    37881000    CYP2E1  0
chr10   37881003    37881011    RET 789

I have already tried bedtools intersect but since the headers are included in the file it is not able to read the file. Is there any other way of doing it

intersetct bed • 553 views
ADD COMMENTlink modified 20 months ago by Alex Reynolds28k • written 20 months ago by mittu1602160
1

since the headers are included in the file

remove the headers...

grep -v '^space' file1a.bed | sort -t $'\t' -k1,1 -k2,2n > file1b.bed
ADD REPLYlink modified 20 months ago • written 20 months ago by Pierre Lindenbaum121k

sorry I need those headers, as mentioned in the question: (since its a compilation of 50 files the header names are also needed for reference)

ADD REPLYlink written 20 months ago by mittu1602160

this is basic linux , you can always add the header later (!)

echo -e "space\tstart\tend\tV4\tavgCoverage\tchrom2\tstart2\tend2"  > result2.bed 
cat result.bed >> result2.bed
ADD REPLYlink written 20 months ago by Pierre Lindenbaum121k

I am aware of extracting headers and later adding it, but there are 50 headers which are in the middle of the file some thing like this:

===> file1 <===
chr10 25689 25698 
chr1 256987 569846
===>file2<===
chr6 78965 789577

and so on..! it will be difficult to remove headers for 50-100 files.

ADD REPLYlink modified 20 months ago • written 20 months ago by mittu1602160
1

If you do not remove the headers, these files are not in a format that any existing tools use out-of-the-box. You can always write some custom scripts to process the data any way you like.

ADD REPLYlink written 20 months ago by Sean Davis25k

I don't understand where are the headers in your example. It looks like a head command on multiple files.

ADD REPLYlink written 20 months ago by Pierre Lindenbaum121k
2
gravatar for Alex Reynolds
20 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Remove the headers and then use BEDOPS bedmap:

$ bedmap --echo --echo-map-id-uniq file2 file1 > answer.bed
ADD COMMENTlink written 20 months ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 943 users visited in the last hour