bedtools sort doesn't properly sort
3
0
Entering edit mode
3.6 years ago
vctrm67 ▴ 50

When I run bedtools sort, I get the following:

1   3587786 3593830 DUP
1   8750767 8828371 DEL
1   36370717    36398152    DUP
1   37116380    37129915    DUP
1   162941074   163012466   DUP
1   221854550   221874415   DUP
1   225906937   225913690   DUP
1   242796286   242812227   DUP
1   247955111   248002324   DUP
10  29644179    29717839    DUP
10  30360966    30555657    DUP
10  104505978   104644200   DUP
11  36949051    37006537    DUP
11  36949229    36972786    DEL
11  55978335    57418552    DEL
11  57418216    57595965    DUP
11  58420721    64632422    DEL
11  64663277    68016329    DEL
11  84059308    84186365    DEL
11  84875159    84878687    DEL
11  89036814    89053054    DUP
11  111326059   111413764   DUP
11  126274726   126343631   DUP
12  1352484 1354383 DUP
12  3530784 3532133 DEL
12  17372858    17391056    DUP
12  51940963    51982992    DUP
12  56818358    56941575    DUP
12  104329540   104421351   DEL
12  118643815   118666853   DUP
13  28134112    28157354    DUP
13  50721753    50771387    DUP
13  56083071    56083952    DEL
13  61947249    61951111    DUP
14  60403716    60427256    DUP
14  68405875    68502245    DUP
14  103603568   103656370   DUP
15  23669286    26039974    DUP
15  25548412    87348183    DEL
15  26123152    56856793    DUP
15  26752190    87856979    DUP
15  41785876    41853411    DUP
15  83977043    84000961    DUP
16  6304191 6590715 DEL
16  62132463    62134963    DEL
16  70729282    70731346    DEL
16  73689518    73691813    DEL
16  84704175    84714354    DEL
16  84722858    84751052    DEL
16  89848967    89987815    DUP
17  1995290 2390445 DEL
17  38204642    38310109    DUP
17  57799417    57910141    DEL
17  57920249    58047241    DUP
17  68592593    68622178    DUP
18  3505741 3547913 DUP
18  20342424    20432565    DUP
18  20393752    20448551    DUP
18  21652203    21671451    DUP
18  24486507    24489417    DUP
18  36792275    37105527    DEL
18  37087067    37227666    DEL
18  69463251    69522471    DUP
19  1092185 1093193 DEL
2   22469642    22474593    DUP
2   25727869    25798458    DUP
2   27883116    27965259    DUP
2   68139988    68166896    DUP
2   70193567    70214706    DUP
2   74755954    74877395    DEL
2   77127844    77128615    DEL
2   78527881    78592889    DEL
2   80253547    80255520    DUP
2   105971475   106035686   DUP
2   134680094   134684447   DUP
2   141662856   142201021   DEL
2   142029938   142032894   DUP
2   194457827   194631733   DEL
2   228548592   228549878   DEL
2   239350439   239902324   DEL
20  7913292 8244617 DEL
20  19392651    19568791    DEL
20  19770138    19931400    DEL
20  20790991    54856844    DEL
20  23447005    38309354    DEL
20  32997366    33005272    DEL
21  18871398    18874952    DEL
21  31582213    31686687    DEL
22  36244665    41178379    DEL
22  36709460    36755109    DEL
22  41935726    41937614    DUP
3   21737359    21742409    DUP
3   32346989    32536254    DUP
3   40154386    40155854    DEL
3   143067607   144816207   DEL
3   149288829   149328893   DEL
3   163152744   175520921   DUP
3   168947724   168990585   DUP
3   182846242   182866635   DUP
4   85889344    85954331    DUP
4   90668271    90695386    DUP
4   189696034   189741845   DUP
5   2075360 2181835 DUP
5   4441342 4464641 DUP
5   8472307 8494822 DUP
5   12346943    12451398    DEL
5   12842023    12884886    DEL
5   15001837    15032170    DUP
5   18812137    20053513    DUP
5   19869425    20072753    DEL
5   25161140    25213112    DUP
5   25185863    25212811    DEL
5   25352641    25362036    DUP
5   25937752    25938639    DEL
5   26596665    27769459    DEL
5   28345390    28347972    DUP
5   36566708    36675585    DEL
5   36732875    36752311    DEL
5   42361045    43612550    DUP
5   82085386    82107332    DUP
5   112110212   112280266   DUP
5   119565287   119632138   DUP
5   147688213   147689425   DEL
6   13435175    13531599    DUP
6   27825467    27842457    DUP
6   63339829    67265118    DEL
6   67518457    68140225    DEL
6   72953466    73014369    DUP
6   149717972   149769303   DUP
7   2312288 2402528 DEL
7   69124809    69156265    DUP
7   111954642   112056372   DUP
7   127715199   127735338   DUP
7   155006218   155006807   DEL
8   3391032 3404546 DEL
8   3688833 3759324 DEL
8   3760301 4057787 DEL
8   4115197 4204462 DEL
8   4394438 4399086 DUP
8   11087434    11106117    DUP
8   39249879    69947393    DEL
8   64289532    64291513    DEL
8   71531652    71818924    DEL
8   72528893    72532864    DEL
8   75682955    75689847    DUP
8   88431265    89364629    DUP
8   88646662    90564843    DUP
8   90046592    90068833    DEL
8   98772539    98785858    DUP
8   102321114   103153954   DEL
8   123556211   123606461   DUP
9   5819608 8043573 DEL
9   8300048 20376616    DUP
9   8879711 8884884 DEL
9   20102527    20546762    DUP
X   6615453 7233043 DEL
X   7405402 7573578 DEL
X   7854262 7880293 DEL
X   17609655    17654549    DUP
X   17639706    17647675    DEL
X   48302016    48302850    DEL
X   48305670    48335871    DEL
X   96228126    96299706    DEL
X   96247808    96278461    DEL
X   96338461    96405920    DEL
X   104943565   104977905   DUP
X   112527968   112535614   DEL
X   113052785   113065725   DUP
X   123499409   133296545   DEL
X   123936182   123937024   DEL

As you can see, the chromosomes aren't in sorted order. Is there some other tool I should be using? I also tried sortBed and sort -k1,1 k2,2n but they give the same thing.

bedtools • 3.1k views
ADD COMMENT
3
Entering edit mode
3.6 years ago

The chromosomes have been sorted alphabetically by default.

If you have e.g. a .fai file that lists the chromosomes in the natural order that you would prefer, you should use it with the bedtools sort -g option to specify your preferred chromosome ordering:

bedtools sort -g genome.fa.fai -i foo.bed
ADD COMMENT
1
Entering edit mode
3.6 years ago
ATpoint 82k

You can force sort to do natural sort with the -V option.

ADD COMMENT
1
Entering edit mode
3.6 years ago

As stated in the disclaimer of the bedtools sort documentation here

bedtools sort is merely a convenience utility, as the UNIX sort utility will sort BED files more quickly while using less memory

And if fact bedtools sort, sortBed and sort -k1,1 -k2,2n do provide the same output.

The default sorting works alphabetically, therefore chromosomes are sorted as 1 10..19 2 20..22 MT X Y

As already suggested, it looks like the natural sorting is what you were expecting, having chromosomes sorted as 1..22 MT X Y

If you want to take it further, you can sort naturally plus leave MT lines at the end:

sort -k1,1V -k2,2n -k3,3n file.bed \
| perl -ne 'if (/^\S*M/) { $mt .= $_ } else { print } END { print $mt }'
ADD COMMENT

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6