Difference between sortBed in bedtools and sort in unix
3
0
Entering edit mode
7.6 years ago
mary99 ▴ 80

Hi all,

Do you know the difference between sortBed​ and sort ?theses two give different results.

cat file.sort.bed |uniq|wc -l

cat file.bed |sort|uniq|wc -l

Thanks

sequence ChIP-Seq next-gen • 4.5k views
ADD COMMENT
0
Entering edit mode

There are many differences. If you specify exact commands you used, it would be easier to figure out why they are different for you.

ADD REPLY
0
Entering edit mode

for the first line I sort my bed file by sortbed from bedtools and then I used uniq and then count the lines

1) sortBed [OPTIONS] -i <bed gff="" vcf=""> 2) uniq 3) wc -l

while in second command I used directly bed file then used sort in unix script,uniq and finally count the lines.

ADD REPLY
1
Entering edit mode

You probably just did normal sort rather than sort -k1,1 k2,2n, which more similar to bedtools.

ADD REPLY
0
Entering edit mode

A bed file is in general a binary file and sort or cat on that file directly will probably not give you anything meaningful.

ADD REPLY
1
Entering edit mode
ADD REPLY
1
Entering edit mode

Ups, sorry, my bad, did not read title/post with enough care!

But to justify my answer:

http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#bed

ADD REPLY
0
Entering edit mode

Whoa! Binary ped. I guess there are at least two beds in genomics then.

ADD REPLY
0
Entering edit mode

Then I would recommend you read https://genome.ucsc.edu/FAQ/FAQformat.html because bed file is three columns file format which includes first column as chromosome name,second and third as start and end sites of interest regions.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Unix's sort will not handle any headers of the bed file. It won't be able to handle properly any compressed form of bed, like bedgraph or bgzipp-ed bed. It would require to use the correct -n and V options to properly sort fields as numeric or characters.

ADD REPLY
0
Entering edit mode

Unix's sort will not handle any headers of the bed file. It won't be able to handle properly any compressed form of bed, like bedgraph or bgzipp-ed bed. It would require to use the correct -n and V options to properly sort fields as numeric or characters.

ADD REPLY
0
Entering edit mode
7.6 years ago

Just guessing... sortBed may not break ties once it sorts by chrom, start end. I.e. duplicate lines having the same coordinates stay unsorted and uniq count them more than once. Unix sort by default sorts by additional fields to break ties. For example, given this file:

a   1
a   2
a   1
b   1
b   2
b   1

Unix sort without breaking ties (what sortBed might do):

sort -k1,1 -s test.txt | uniq | wc -l
6

Now with default, breaking ties:

sort -k1,1 test.txt | uniq | wc -l
4
ADD COMMENT
0
Entering edit mode
7.6 years ago

Use sort-bed to sort BED files on all three relevant fields.

It runs on arbitrary BED input — it handles input with headers, for instance, or input with more than six columns — and it runs faster than Unix sort.

Then you don't have to worry about these issues!

ADD COMMENT
0
Entering edit mode
7.6 years ago

linux sorting allows alphanumeric sorting as well now. You have to use the option V. If you want your bed files to be sorted chromosome wise then by region, use sort -k1,1V -k2,2n in.bed > out.sorted.bed. Without that option, sort will not perform an alphanumeric sort.

ADD COMMENT

Login before adding your answer.

Traffic: 2467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6