Question: determine overlaps of chromosome coordinates from two different files
0
gravatar for mittu1602
2.0 years ago by
mittu1602170
India
mittu1602170 wrote:

I have two bed files:

file1: (since its a compilation of 50 files the header names are also needed for reference)

==> /home/blade/1-processedData/sample.coverageExon.csv <==

space   start   end V4  avgCoverage
chr10   133525740   133525741   CYP2E1  475
chr10   133527062   133527065   CYP2E1  402
chr10   133527323   133527328   CYP2E1  441
chr10   133531206   133531209   CYP2E1  104
chr10   133534752   133534755   CYP2E1  395
chr10   133535862   133535865   CYP2E1  278
chr10   133537632   133537635   CYP2E1  0
chr10   43572708    43572779    RET 789

file2:

chr10   37868205    37868205
chr10   37880220    37880220
chr10   37880220    37880233
chr10   37880261    37880261
chr10   37880261    37880261
chr10   37881000    37881000
chr10   37881003    37881011
chr10   37881332    37881332
chr10   37881616    37881616
chr6    152415537   152415537

I want to intersect file2 on file1, below is my expected output:

==> /home/blade/1-processedData/sample.coverageExon.csv <== 

chr10   37868205    37868205    CYP2E1  402
chr10   37880220    37880220    CYP2E1  441
chr10   37880220    37880233    CYP2E1  104
chr10   37880261    37880261    CYP2E1  395
chr10   37880261    37880261    CYP2E1  278
chr10   37881000    37881000    CYP2E1  0
chr10   37881003    37881011    RET 789

I have already tried bedtools intersect but since the headers are included in the file it is not able to read the file. Is there any other way of doing it

intersetct bed • 615 views
ADD COMMENTlink modified 2.0 years ago by Alex Reynolds29k • written 2.0 years ago by mittu1602170
1

since the headers are included in the file

remove the headers...

grep -v '^space' file1a.bed | sort -t $'\t' -k1,1 -k2,2n > file1b.bed
ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Pierre Lindenbaum124k

sorry I need those headers, as mentioned in the question: (since its a compilation of 50 files the header names are also needed for reference)

ADD REPLYlink written 2.0 years ago by mittu1602170

this is basic linux , you can always add the header later (!)

echo -e "space\tstart\tend\tV4\tavgCoverage\tchrom2\tstart2\tend2"  > result2.bed 
cat result.bed >> result2.bed
ADD REPLYlink written 2.0 years ago by Pierre Lindenbaum124k

I am aware of extracting headers and later adding it, but there are 50 headers which are in the middle of the file some thing like this:

===> file1 <===
chr10 25689 25698 
chr1 256987 569846
===>file2<===
chr6 78965 789577

and so on..! it will be difficult to remove headers for 50-100 files.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by mittu1602170
1

If you do not remove the headers, these files are not in a format that any existing tools use out-of-the-box. You can always write some custom scripts to process the data any way you like.

ADD REPLYlink written 2.0 years ago by Sean Davis25k

I don't understand where are the headers in your example. It looks like a head command on multiple files.

ADD REPLYlink written 2.0 years ago by Pierre Lindenbaum124k
2
gravatar for Alex Reynolds
2.0 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Remove the headers and then use BEDOPS bedmap:

$ bedmap --echo --echo-map-id-uniq file2 file1 > answer.bed
ADD COMMENTlink written 2.0 years ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1886 users visited in the last hour