Question: Finding the number of coordinates
0
gravatar for hosein_salehi6
15 days ago by
hosein_salehi60 wrote:

Dear Biostar member I have two files(big) and each file has one column of coordinates in genome. I want to find the number of coordinates in second file which including in each coordinate of first file (The result should be two column, the first column include coordinates same as first file and second column include numbers).

First file:
chr22:15273-141831              
chr9:19992214-20053813
chr1:220511845-220946924
chr6:51386116-51758466
chr8:64017612-64288853
chr5:7523216-7614366
chr21:49691288-49764730



Second file: 
chr22:15273-132511
chr22:140223-141831
chr22:32345-122987
chr9:19992214-20033814
chr9:20012214-20053813
chr1:220511845-220748925
chr1:220615645-220846924
chr1:220615645-220946924
chr6:51386116-51459367
chr6:51386116-51758466
chr8:64017612-64177753
chr8:64277712-64288853
chr5:7523216-7534366
chr5:7544217-7554469
chr5:7554619-7554963
chr5:7600000-7614366
chr21:49691288-49764730

The result should be like:

chr22:15273-141831             3
chr9:19992214-20053813         2
chr1:220511845-220946924       3
chr6:51386116-51758466          2
chr8:64017612-64288853          2
chr5:7523216-7614366            4
chr21:49691288-49764730         1

Is there an easy way to solution in linux(shell)? Thanks

genome • 103 views
ADD COMMENTlink written 15 days ago by hosein_salehi60

Convert these files to the BED format, e.g. using awk (essentially it is simply a replacement of : and - by \t and subtraction of the start coordinate by 1, see the BED format specifications why that is) and then use bedtools intersect. Have a look at the counting (-c) option of intersect. Please try it out and come back in case of problems.

ADD REPLYlink written 15 days ago by ATpoint17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 814 users visited in the last hour