Merging bed files
3
1
Entering edit mode
8.7 years ago
maruthi ▴ 10

Hi,

I have two bed files (file 1 and file 2)....which have common SNPs along with chromosome numbers and regions. File 2 has few SNP ids (along with chromosome number, start and end sites) which are already in file 1.

I am a completely new entry into this field. I am told I can use Bedtools to combine the above files and write a command in UNIX to perform and create one single bed file with no repetitions.

May I please know what commands I should use to combine two bed files but with no common data set in the resulting bed file?

I will eagerly wait for your kind reply.

Thank you,
Maruthi

next-gen • 8.6k views
ADD COMMENT
1
Entering edit mode
8.7 years ago

May I please know what commands I should use to combine two bed files but with no common data set in the resulting bed file?

This seems like a different question. You can use BEDOPS bedops --not-element-of to remove elements from one BED file that overlap those in a second BED file:

$ bedops --not-element-of 1 first.bed second.bed > answer.bed

In this example, the file answer.bed contains elements exclusive to the file first.bed.

I recommend using BEDOPS sort-bed to sort BED files to use with BEDOPS tools. It runs faster than GNU sort and has fewer restrictions than other tools:

$ sort-bed unsorted.bed > sorted.bed
ADD COMMENT
0
Entering edit mode

Thank you Alex. I will try your suggestion as well. Thank you.

ADD REPLY
0
Entering edit mode
8.7 years ago
tiago211287 ★ 1.4k

As in the website of Bedtools: http://bedtools.readthedocs.org/en/latest/content/tools/merge.html

bedtools merge requires that you presort your data by chromosome and then by start position (e.g., sort -k1,1 -k2,2nin.bed > in.sorted.bed for BED files).

then you can

bedtools merge [OPTIONS] -i <BED/GFF/VCF/BAM>

after merge all files you want to remove duplicates,

I found a discussion about this here: https://groups.google.com/forum/#!topic/bedtools-discuss/2o7oUgBwebw

As they suggested, you must define duplicates.

[1] "Want you to remove entries where _every_ column is identical?" (easy) or

[2] "Want you to remove entries where the coordinates are the same but for example, the names are different?" (not easy)

I suggest you to look at the link and see the entire discussion but, it seems that you could on the bash shell use this:

sort -k1,1 -k2,2n -k3,3n -u <BED> > output_sorted_uniq.bed # and this can remove duplicates
ADD COMMENT
0
Entering edit mode

Sorry, seems I clicked at the wrong comment place. Tiago, may I please know what the ''u, k1 and k2,2n'' in the command line are ? Thank you.

ADD REPLY
0
Entering edit mode
8.7 years ago
maruthi ▴ 10

Thank you Tiago. I have created a bed file with repeats by combining two. Now, I have to remove the repeats/duplicates. I will go through the link you provided and will see how far I can figure it out. Thank you

ADD COMMENT
0
Entering edit mode

You 're welcome. Just one thing, when posting things that aren't an answer, use the gray, little, [add comment] button. Save the big green [Add answer] box for only answers.

Good luck.

ADD REPLY
0
Entering edit mode

Thank you for letting me know about add comment. Tiago, may I know what u, k1 and k2 in the command line are ?

ADD REPLY

Login before adding your answer.

Traffic: 1651 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6