Tool: Sorting bed files with bash sort
0
QVINTVS_FABIVS_MAXIMVS • 2.2k wrote:
I've been implementing this command for the past few days now and I wished I did it earlier.
It functions similar to sortBed in bedtools but uses the bash sort which can be used in pipes
-
Find your .bashrc file in your home directory
$ cd $HOME $ vi .bashrc # vi ~/.bashrc should work fine from any directory
-
Add this to .bashrc
alias sortbed="sort -k1,1 -k2,2g "
-
Save and source your .bashrc to get it to work
$ source ~/.bashrc
Example of use:
$ intersectBed -a in.bed -b /segDup_unmappable.bed -wao | sortbed |uniq | cut -f 1,2,3,8 >in_segDup_unmappable_overlap.txt
This intersects a bed file of chr, start, end to a list of segmental duplications and unmappable regions in hg19. It also pipes to bash commands to only remove the in positions and the number of base pairs overlapping it.
sortbed is used to sort the output and uniq is applied to return only unique lines. You can treat sortbed like sort. Just a nice shortcut I thought others might like.
ADD COMMENT
• link
•
modified 3.5 years ago
•
written
3.5 years ago by
QVINTVS_FABIVS_MAXIMVS • 2.2k
Like I said, bedtools readthedocs gives the UNIX command to sort bed files; I don't see any advantage in your approach.
and set LC_ALL=C to make things faster.
BEDOPS sort-bed works faster at sorting BED files than GNU sort, and you can pipe data in and out via standard UNIX streams.
Unlike other tools, it also handles arbitrary numbers of columns and can be assigned a chunk of memory, to sort very large BED files that will not otherwise fit into system memory.
Add semantic version sort to the first key for the chromosomes to be sorted correctly.