Question: (Closed) Sort a bed file
1
gravatar for kmkdesilva
4 weeks ago by
kmkdesilva90
United States
kmkdesilva90 wrote:

Hi

I tried to sort a bed file using bedtools. It sorted the chromosomes in,

chr1
chr10
chr11
chr12
.
.
chr19
chr2
chr20

Please tell me how I can sort the bed file in chr1, chr2. chr3 ..... chrX order?

sort bed file bedtools • 148 views
ADD COMMENTlink modified 4 weeks ago by zx87549.7k • written 4 weeks ago by kmkdesilva90

Hello kmkdesilva!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLYlink written 4 weeks ago by John Marshall2.1k

Thanks for linking up the threads John. Go raibh maith agat.

ADD REPLYlink written 4 weeks ago by Kevin Blighe66k
2
gravatar for Kevin Blighe
4 weeks ago by
Kevin Blighe66k
Kevin Blighe66k wrote:

There are different combinations of commands to do this.

1,sort -V

The easiest is the 'natural' alphanumeric sort invoked with the shell command sort -V:

cat test.bed
chrX    4567    4569
chr1    5555    5556
chr1    6666    6667
chr10   1234    5678
chrX    1234    1235
chr2    9876    9877
chrY    1111    5555
chr15   4444    5555
chr22   3214    3245
chrMT   4444    4445

sort -k1,1 -k2,2n -V test.bed
chr1    5555    5556
chr1    6666    6667
chr2    9876    9877
chr10   1234    5678
chr15   4444    5555
chr22   3214    3245
chrMT   4444    4445
chrX    1234    1235
chrX    4567    4569
chrY    1111    5555

If you have Mac OS, you may need to install GNU sort.

2, custom for loop

As you can see, this still messes up the chrMT, which we may want to appear last.

So, we can write our own very simple loop to specifically control the order:

FS="\t" ;
bed="test.bed" ;
for chr in {1..22} X Y MT; do
  grep -P "^chr""${chr}""${FS}" "${bed}" | sort -k2,2n
done

chr1    5555    5556
chr1    6666    6667
chr2    9876    9877
chr10   1234    5678
chr15   4444    5555
chr22   3214    3245
chrX    1234    1235
chrX    4567    4569
chrY    1111    5555
chrMT   4444    4445

We can mess around, too:

FS="\t" ;
bed="test.bed" ;
for chr in 1 Y 10 X MT; do
  grep -P "^chr""${chr}""${FS}" "${bed}" | sort -k2,2n
done

chr1    5555    5556
chr1    6666    6667
chrY    1111    5555
chr10   1234    5678
chrX    1234    1235
chrX    4567    4569
chrMT   4444    4445

Kevin

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe66k

Thank you very much Kevin. If I add -u option to this command 'sort -k1,1 -k2,2n -V test.bed chr1' will I get both sorted and unique records?

ADD REPLYlink written 4 weeks ago by kmkdesilva90

I have not used sort -u; however, you can of course try it with a few test examples.

If I want to remove duplicate lines, I use awk:

cat test.bed 
a   b   c
d   e   f
a   b   c
d   e   f
g   h   i
a   b   c
d   e   f

awk -F "\t" '!line[$0]++' test.bed
a   b   c
d   e   f
g   h   i
ADD REPLYlink written 4 weeks ago by Kevin Blighe66k

Thank you Kevin. -u also did the job.

ADD REPLYlink written 4 weeks ago by kmkdesilva90
2
gravatar for Jorge Amigo
4 weeks ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

These are the 2 sorting commands I usually consider for a bed file:

1.If I care about the order (resulting 1..22 MT X Y):

sort -k1,1V -k2,2n file.bed

2.If I don't care about the order (resulting 1 10..19 2 20..22 3..9 MT X Y):

sort -k1,1 -k2,2n file.bed

Option 1 is the one I like the most, but option 2 is the one I choose when dealing with bedtools.

I must admit that when my perfectionist self wants to have all positions (not only starts, but also ends) perfectly sorted and the MT lines at the end, I have sometimes even gone for this:

sort -k1,1V -k2,2n -k3,3n file.bed \
| perl -ne 'if (/^\S*M/) { $mt .= $_ } else { print } END { print $mt }'
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Jorge Amigo12k

Perfect. Thank you very much for sharing Jorge.

ADD REPLYlink written 4 weeks ago by kmkdesilva90
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1740 users visited in the last hour