How to remove the same genome coordinate in two bed files?
2
0
Entering edit mode
7.8 years ago
bright602 ▴ 50

Hi, I have two bed files, say file A and file B.


File A:


  • chr3 181879479 181879497
  • chr3 181879496 181879514
  • chr3 181879507 181879525
  • chr3 181879555 181879573

File B:


  • chr3 181879496 181879514
  • chr3 181879507 181879525

How can I subtract the common items in A and B from A, and get bed file C as follow


  • chr3 181879479 181879497
  • chr3 181879555 181879573

Thanks a lot for your help.

next-gen genome R sequencing • 1.9k views
ADD COMMENT
1
Entering edit mode
7.8 years ago
slw287r ▴ 140

use the following command to remove intersected `regions' of A and B from A:

bedtools subtract -a A.bed -b B.bed > C.bed

or if you just want to remove the shared `lines' from A:

awk -F'\t' 'BEGIN{OFS="\t"}NR==FNR{a[$0];next}{if(!($0 in a)){print}}' B.bed A.bed > C.bed
ADD COMMENT
0
Entering edit mode
7.8 years ago
igor 13k

Try bedtools intersect -v -a A.bed -b B.bed: http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html

ADD COMMENT
0
Entering edit mode

Thank you, I tried using bedtools, but it turns out lots of item are lost. In the case above, it only shows chr3 181879555 181879573

ADD REPLY

Login before adding your answer.

Traffic: 2411 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6