What Are The Bioinformatics-Related Aliases Or Functions In Your Bashrc
5
32
Entering edit mode
11.9 years ago
brentp 24k

I've added a couple of things to my bashrc that I use pretty often. Some of the simpler ones:

greatly reduce time for a lot of operations.

 export LC_ALL=C

e.g sorting a 1.8m line bed file goes from 43 seconds with LC_ALL="" to 3.2 seconds with LC_ALL=C

A quick check to make sure all lines have the same number of columns:

function check-columns(){
    awk 'BEGIN {FS="\t" }{ print NF }' $1 | sort -u
}

Output tab-delimited output so that the columns are aligned:

alias cols="column -s$'\t' -t"

use like:

head some.bed | cols
bash • 5.0k views
ADD COMMENT
4
Entering edit mode

Nice question. Looking forward for answers. I keep on bumping in this LC_ALL thing. Do you have a nice link that explain what locale is andwhy it matters?

ADD REPLY
0
Entering edit mode

IIUC LC_ALL=C tells whoever will listen that strings are not multi-byte (e.g. unicode), so no conversion is needed.

ADD REPLY
12
Entering edit mode
11.9 years ago

A few things from mine:

# easy way to do things per-chromosome:
# for i in $CHROM;do *your code*;done
export CHROM="$(seq 1 22) X Y MT"

alias sv='samtools view'
alias svh='samtools view -h'

#grab the header of a VCF
function vcfhead {
    head -n 1000 $1 | grep "^#"
}

#unwad tarballs
alias unwad='tar -xzvf'

#find in the current directory
alias ff='find . -name $1'

#sum a column of integers
alias sumcol='awk '\''{ SUM += $1} END { print SUM}'\'

#sum a column of floats
alias sumcolfloat='awk '\''{ SUM += $1} END { OFMT="%4.2f"; print SUM}'\'' <$i'

#convert csv to tab-delimited
alias csv2tab='sed '\''s/\,/\t/g'\'

# convert tabs to new lines
alias tab2nl='perl -pe "s/\t/\n/g"'

#git stuff
#shows a preview of what's outgoing if you do a git push
function grout {
 git fetch origin master
 gd2 $(parse_git_branch) FETCH_HEAD
}
#shows a preview of what's incoming if you do a git pull
function grin {
 git fetch origin master
 gd2 FETCH_HEAD $(parse_git_branch)
}

#show the column headers with corresponding field numbers
#I use this constantly
alias header='head -n 1 | tab2nl | cat -n'
ADD COMMENT
3
Entering edit mode

Just for the fun, here is a Awk replacement for your "header" command:

awk 'BEGIN {FS="\t"; OFS="\t"} NR==1 {for (i=1; i <= NF; i++) {print i, $i}}'

ADD REPLY
0
Entering edit mode

This was useful since the header alias didn't work in my .bashrc alias set (even with the tab2nl alias added). I made this a function. Agree with Madeline, though. Really useful.

ADD REPLY
0
Entering edit mode

Oh, I like your header / tab2nl combo.

ADD REPLY
10
Entering edit mode
11.9 years ago

Not to beat this to death but I find the directory-based bash history scheme indispensable. I honestly don't know how people function with global histories.

function mycd()
{
    tmpDir="$PWD"
    echo "#"`date +%s`" $USER -> $@"  >> "$HISTFILE"

    builtin cd "$@" # do actual cd                                                                        

    #if this directory is writable then write to directory-based history file
    #otherwise write history in the usual home-based history file                    
    touch "$PWD/.dir_bash_history" 2>/dev/null && export HISTFILE="$PWD/.dir_bash_history" || export HISTFILE="$HOME/.bash_history";
    echo "#"`date +%s`" $USER <- $OLDPWD" >> "$HISTFILE"
}
alias cd="mycd"
#initial shell opened                                                                                     
export HISTFILE="$PWD/.dir_bash_history"
#timestamp all history entries                                                                            
export HISTTIMEFORMAT="%h/%d - %H:%M:%S "
export HISTCONTROL=ignoredups:erasedups
export HISTSIZE=1000000
export HISTFILESIZE=1000000
shopt -s histappend ## append, no clearouts                                                               
shopt -s histverify ## edit a recalled history line before executing                                      
shopt -s histreedit ## reedit a history substitution line if it failed                                    

## Save the history after each command finishes                                                           
## (and keep any existing PROMPT_COMMAND settings)                                                        
export PROMPT_COMMAND="history -a; history -c; history -r; $PROMPT_COMMAND"
ADD COMMENT
2
Entering edit mode

Okay, okay, I'll do it. Thanks.

ADD REPLY
0
Entering edit mode

my thoughts exactly ;-)

ADD REPLY
0
Entering edit mode

accepting this answer because of the directory-specific history. really useful.

ADD REPLY
5
Entering edit mode
11.9 years ago

Thanks for the tips. My suggestion might be obvious, but I generally want to find something in the same set of directories, so it's a speedup for me:

Seach R scripts in a bunch of directories

alias searchR='find /place1 /place2 -name "*.R" | xargs grep '

A few of the answers from here apply: http://www.biostars.org/post/show/6660/bioinformatics-cheat-sheet/

I have a lot of little one-liners and snippets that I don't use often enough that I've bothered to alias them, though. These I have tended to save in text files in a common directory in files called awktips.txt, sedtips.txt, and I just go and grab the line of interest as needed. Perhaps I should go through the trouble to alias some of them, but I'd probably forget the alias name I gave it anyway and have to look it up.

ADD COMMENT
0
Entering edit mode

Wouldn't ack --R pattern do the same as your searchR alias?

ADD REPLY
0
Entering edit mode

Oh... I hadn't heard of ack. Looks like it does cwd on down, though, not multiple directories?

ADD REPLY
4
Entering edit mode
11.9 years ago

Heng Li's bioawk linked as hawk ;-) and virtualenv activation.

I do add lots of short term aliases when working with a specific tool as well.

ADD COMMENT
2
Entering edit mode
11.9 years ago
Andreas ★ 2.5k

Just to add two more:

Get number of sequences in a (gzipped) fasta file:

alias fasta_num_seq='zgrep -c "^>"'

Get the number of reads from a (gzipped) fastq file (note: in theory '+' could also happen as first character in the quality line):

alias fastq_num_seq='zgrep -c "^+"'
ADD COMMENT

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6