Forum:Awk in Bioinformatics
Entering edit mode
5.8 years ago
Shicheng Guo ★ 9.5k

Here to show the examples to use awk with powerful recombination: I will update more examples.

  1. Merge column 4 and 5 and ouput to the file named as column 1 and 3.

    awk '{print $4"\n"$5 > "./snpset/$1.$3.txt}'  GRCH37.SNP150.bed
  2. Split and print content to it (as filename)

    awk '{ split($2, a, "_"); print $1"\t"a[2]"\t"$3 >> a[1]".txt"; }' GRCH37.SNP150.bed
  3. NF gives you the total number of fields in a record while NR give you current process line number awk '{print NR,"->",NF}' GRCH37.SNP150.bed

  4. NR and FNR will give you current line according to single file or multiple file. FILENAME give you filename. awk '{print FILENAME, FNR, NR;}' hg19.snp150.bed hg38.snp150.bed

  5. With 1,4,8 as parameter for plink and submit as pbs job awk '$8!="." {cmd="plink --bfile ~/1000Genome/"$1 " --ld "$4" "$8 " --out './LD/'"$4"."$8".r2 | qsub -N "$4"."$8;system(cmd)} -e ./temp/ -o ./temp/' hg19.DMR.bed

  6. join, sort, uniq, awk together. join -t $'\t' -1 1 -2 2 <(sort -t $'\t' -k1,1 input.txt) <(sort -t $'\t' -k2,2 ref2.txt) | uniq | awk -F '\t' '{line=sprintf("%s\t%s\t%s\t%s\t%s",$1,$2,$3,$4,$5);if($7>=$2 && $7<=$3) {a[line]+=int($6);} else {a[line]+=0;}} END {for(line in a) printf("%s\t%d\n",line,a[line]);}'

  7. multiple (three) split of awk command: =; space and D

    grep R-sq *log | awk -F'[=\sD]' '$5>0.1{print}'
perl awk shell • 5.1k views
Entering edit mode

Some of these commands are very specific., and the descriptions don’t really explain what they do or why they are useful.

If you wish to post useful commands, I would suggest contributing to one of the existing threads for example here. These kinds of compendiums work best when they are not spread out all over the place.

Also, why have you tagged Perl?

Entering edit mode

Hi Shicheng Guo,

These commands are potentially useful, but as they lack information (and are for very specific use cases) users will not find your post when they need it. Perhaps you should consider getting your own blog (e.g. Wordpress) and explain these commands in more detail in a series of post. Awk in bioinformatics already sounds like a good title, perhaps "unix commands" in bioinformatics would broaden your scope.

You labelled this as a "Forum" - which is generally a post type for "a topic for discussion for which no definite answers exist". It's not exactly a "Tutorial" either, since you are not really teaching anything, just showing a couple of commands. It is for example entirely unclear what your sixth command does and why anyone would use it.

We value all contributions to biostars, but right now, this mostly looks like you are trying to get some upvotes.


Entering edit mode

Yes. I just record it for myself. and I am happy if it helps others. if not, I am sure it is no harm to others, right?

Entering edit mode
5.7 years ago
Batu ▴ 260

I can refer this link that includes more examples and other cases: Useful bash one-liners for bioinformatics.


Login before adding your answer.

Traffic: 1486 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6