Question

Merging bed files

1

Entering edit mode

8.7 years ago

maruthi ▴ 10

Hi,

I have two bed files (file 1 and file 2)....which have common SNPs along with chromosome numbers and regions. File 2 has few SNP ids (along with chromosome number, start and end sites) which are already in file 1.

I am a completely new entry into this field. I am told I can use Bedtools to combine the above files and write a command in UNIX to perform and create one single bed file with no repetitions.

May I please know what commands I should use to combine two bed files but with no common data set in the resulting bed file?

I will eagerly wait for your kind reply.

Thank you,
Maruthi

next-gen • 8.6k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by maruthi ▴ 10

Ram · Answer 1 · 2015-08-19

1

Entering edit mode

8.7 years ago

Alex Reynolds 35k

May I please know what commands I should use to combine two bed files but with no common data set in the resulting bed file?

This seems like a different question. You can use BEDOPS bedops --not-element-of to remove elements from one BED file that overlap those in a second BED file:

$ bedops --not-element-of 1 first.bed second.bed > answer.bed

In this example, the file answer.bed contains elements exclusive to the file first.bed.

I recommend using BEDOPS sort-bed to sort BED files to use with BEDOPS tools. It runs faster than GNU sort and has fewer restrictions than other tools:

$ sort-bed unsorted.bed > sorted.bed

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by Alex Reynolds 35k

0

Entering edit mode

Thank you Alex. I will try your suggestion as well. Thank you.

ADD REPLY • link 8.7 years ago by maruthi ▴ 10

Ram · Answer 2 · 2015-08-19

As in the website of Bedtools: http://bedtools.readthedocs.org/en/latest/content/tools/merge.html

bedtools merge requires that you presort your data by chromosome and then by start position (e.g., sort -k1,1 -k2,2nin.bed > in.sorted.bed for BED files).

then you can

bedtools merge [OPTIONS] -i <BED/GFF/VCF/BAM>

after merge all files you want to remove duplicates,

I found a discussion about this here: https://groups.google.com/forum/#!topic/bedtools-discuss/2o7oUgBwebw

As they suggested, you must define duplicates.

[1] "Want you to remove entries where _every_ column is identical?" (easy) or

[2] "Want you to remove entries where the coordinates are the same but for example, the names are different?" (not easy)

I suggest you to look at the link and see the entire discussion but, it seems that you could on the bash shell use this:

sort -k1,1 -k2,2n -k3,3n -u <BED> > output_sorted_uniq.bed # and this can remove duplicates

score 0 · Answer 3 · 2015-08-19

0

Entering edit mode

8.7 years ago

maruthi ▴ 10

Thank you Tiago. I have created a bed file with repeats by combining two. Now, I have to remove the repeats/duplicates. I will go through the link you provided and will see how far I can figure it out. Thank you

ADD COMMENT • link 8.7 years ago by maruthi ▴ 10

0

Entering edit mode

You 're welcome. Just one thing, when posting things that aren't an answer, use the gray, little, [add comment] button. Save the big green [Add answer] box for only answers.

Good luck.

ADD REPLY • link 8.7 years ago by tiago211287 ★ 1.4k

0

Entering edit mode

Thank you for letting me know about add comment. Tiago, may I know what u, k1 and k2 in the command line are ?

ADD REPLY • link 8.7 years ago by maruthi ▴ 10