Question: What Are The Bioinformatics-Related Aliases Or Functions In Your Bashrc
20
gravatar for brentp
5.5 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

I've added a couple of things to my bashrc that I use pretty often. Some of the simpler ones:

greatly reduce time for a lot of operations.

 export LC_ALL=C

e.g sorting a 1.8m line bed file goes from 43 seconds with LC_ALL="" to 3.2 seconds with LC_ALL=C

A quick check to make sure all lines have the same number of columns:

function check-columns(){
    awk 'BEGIN {FS="\t" }{ print NF }' $1 | sort -u
}

Output tab-delimited output so that the columns are aligned:

alias cols="column -s$'\t' -t"

use like:

head some.bed | cols
bash • 2.5k views
ADD COMMENTlink modified 5.5 years ago by Andreas2.3k • written 5.5 years ago by brentp22k
4

Nice question. Looking forward for answers. I keep on bumping in this LC_ALL thing. Do you have a nice link that explain what locale is andwhy it matters?

ADD REPLYlink written 5.5 years ago by Stefano Berri3.9k

IIUC LC_ALL=C tells whoever will listen that strings are not multi-byte (e.g. unicode), so no conversion is needed.

ADD REPLYlink written 5.5 years ago by brentp22k
12
gravatar for Chris Miller
5.5 years ago by
Chris Miller19k
Washington University in St. Louis, MO
Chris Miller19k wrote:

A few things from mine:

# easy way to do things per-chromosome:
# for i in $CHROM;do *your code*;done
export CHROM="$(seq 1 22) X Y MT"

alias sv='samtools view'
alias svh='samtools view -h'

#grab the header of a VCF
function vcfhead {
    head -n 1000 $1 | grep "^#"
}

#unwad tarballs
alias unwad='tar -xzvf'

#find in the current directory
alias ff='find . -name $1'

#sum a column of integers
alias sumcol='awk '\''{ SUM += $1} END { print SUM}'\'

#sum a column of floats
alias sumcolfloat='awk '\''{ SUM += $1} END { OFMT="%4.2f"; print SUM}'\'' <$i'

#convert csv to tab-delimited
alias csv2tab='sed '\''s/\,/\t/g'\'

# convert tabs to new lines
alias tab2nl='perl -pe "s/\t/\n/g"'

#git stuff
#shows a preview of what's outgoing if you do a git push
function grout {
 git fetch origin master
 gd2 $(parse_git_branch) FETCH_HEAD
}
#shows a preview of what's incoming if you do a git pull
function grin {
 git fetch origin master
 gd2 FETCH_HEAD $(parse_git_branch)
}

#show the column headers with corresponding field numbers
#I use this constantly
alias header='head -n 1 | tab2nl | cat -n'
ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Chris Miller19k
3

Just for the fun, here is a Awk replacement for your "header" command:

awk 'BEGIN {FS="\t"; OFS="\t"} NR==1 {for (i=1; i <= NF; i++) {print i, $i}}'

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Frédéric Mahé2.7k

This was useful since the header alias didn't work in my .bashrc alias set (even with the tab2nl alias added). I made this a function. Agree with Madeline, though. Really useful.

ADD REPLYlink written 5.5 years ago by Ryan D3.2k

Oh, I like your header / tab2nl combo.

ADD REPLYlink written 5.5 years ago by Madelaine Gogol4.8k
10
gravatar for Jeremy Leipzig
5.5 years ago by
Philadelphia, PA
Jeremy Leipzig17k wrote:

Not to beat this to death but I find the directory-based bash history scheme indispensable. I honestly don't know how people function with global histories.

function mycd()
{
    tmpDir="$PWD"
    echo "#"`date +%s`" $USER -> $@"  >> "$HISTFILE"

    builtin cd "$@" # do actual cd                                                                        

    #if this directory is writable then write to directory-based history file
    #otherwise write history in the usual home-based history file                    
    touch "$PWD/.dir_bash_history" 2>/dev/null && export HISTFILE="$PWD/.dir_bash_history" || export HISTFILE="$HOME/.bash_history";
    echo "#"`date +%s`" $USER <- $OLDPWD" >> "$HISTFILE"
}
alias cd="mycd"
#initial shell opened                                                                                     
export HISTFILE="$PWD/.dir_bash_history"
#timestamp all history entries                                                                            
export HISTTIMEFORMAT="%h/%d - %H:%M:%S "
export HISTCONTROL=ignoredups:erasedups
export HISTSIZE=1000000
export HISTFILESIZE=1000000
shopt -s histappend ## append, no clearouts                                                               
shopt -s histverify ## edit a recalled history line before executing                                      
shopt -s histreedit ## reedit a history substitution line if it failed                                    

## Save the history after each command finishes                                                           
## (and keep any existing PROMPT_COMMAND settings)                                                        
export PROMPT_COMMAND="history -a; history -c; history -r; $PROMPT_COMMAND"
ADD COMMENTlink written 5.5 years ago by Jeremy Leipzig17k
2

Okay, okay, I'll do it. Thanks.

ADD REPLYlink written 5.5 years ago by Madelaine Gogol4.8k

my thoughts exactly ;-)

ADD REPLYlink written 5.5 years ago by Istvan Albert ♦♦ 74k

accepting this answer because of the directory-specific history. really useful.

ADD REPLYlink written 5.5 years ago by brentp22k
5
gravatar for Madelaine Gogol
5.5 years ago by
Madelaine Gogol4.8k
Kansas City
Madelaine Gogol4.8k wrote:

Thanks for the tips. My suggestion might be obvious, but I generally want to find something in the same set of directories, so it's a speedup for me:

Seach R scripts in a bunch of directories

alias searchR='find /place1 /place2 -name "*.R" | xargs grep '

A few of the answers from here apply: http://www.biostars.org/post/show/6660/bioinformatics-cheat-sheet/

I have a lot of little one-liners and snippets that I don't use often enough that I've bothered to alias them, though. These I have tended to save in text files in a common directory in files called awktips.txt, sedtips.txt, and I just go and grab the line of interest as needed. Perhaps I should go through the trouble to alias some of them, but I'd probably forget the alias name I gave it anyway and have to look it up.

ADD COMMENTlink written 5.5 years ago by Madelaine Gogol4.8k

Wouldn't ack --R pattern do the same as your searchR alias?

ADD REPLYlink written 5.5 years ago by jake.biesinger0

Oh... I hadn't heard of ack. Looks like it does cwd on down, though, not multiple directories?

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Madelaine Gogol4.8k
4
gravatar for Istvan Albert
5.5 years ago by
Istvan Albert ♦♦ 74k
University Park, USA
Istvan Albert ♦♦ 74k wrote:

Heng Li's bioawk linked as hawk ;-) and virtualenv activation.

I do add lots of short term aliases when working with a specific tool as well.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Istvan Albert ♦♦ 74k
2
gravatar for Andreas
5.5 years ago by
Andreas2.3k
Singapore
Andreas2.3k wrote:

Just to add two more:

Get number of sequences in a (gzipped) fasta file:

alias fasta_num_seq='zgrep -c "^>"'

Get the number of reads from a (gzipped) fastq file (note: in theory '+' could also happen as first character in the quality line):

alias fastq_num_seq='zgrep -c "^+"'
ADD COMMENTlink written 5.5 years ago by Andreas2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 696 users visited in the last hour