Forum: Awk in Bioinformatics
gravatar for Shicheng Guo
11 months ago by
Shicheng Guo7.7k
Shicheng Guo7.7k wrote:

Here to show the examples to use awk with powerful recombination: I will update more examples.

  1. Merge column 4 and 5 and ouput to the file named as column 1 and 3. awk '{print $4"\n"$5 > "./snpset/$1.$3.txt}' GRCH37.SNP150.bed

  2. Split and print content to it (as filename) awk '{ split($2, a, "_"); print $1"\t"a[2]"\t"$3 >> a[1]".txt"; }' GRCH37.SNP150.bed

  3. NF gives you the total number of fields in a record while NR give you current process line number awk '{print NR,"->",NF}' GRCH37.SNP150.bed

  4. NR and FNR will give you current line according to single file or multiple file. FILENAME give you filename. awk '{print FILENAME, FNR, NR;}' hg19.snp150.bed hg38.snp150.bed

  5. With 1,4,8 as parameter for plink and submit as pbs job awk '$8!="." {cmd="plink --bfile ~/1000Genome/"$1 " --ld "$4" "$8 " --out './LD/'"$4"."$8".r2 | qsub -N "$4"."$8;system(cmd)} -e ./temp/ -o ./temp/' hg19.DMR.bed

  6. join, sort, uniq, awk together. join -t $'\t' -1 1 -2 2 <(sort -t $'\t' -k1,1 input.txt) <(sort -t $'\t' -k2,2 ref2.txt) | uniq | awk -F '\t' '{line=sprintf("%s\t%s\t%s\t%s\t%s",$1,$2,$3,$4,$5);if($7>=$2 && $7<=$3) {a[line]+=int($6);} else {a[line]+=0;}} END {for(line in a) printf("%s\t%d\n",line,a[line]);}'

  7. multiple (three) split of awk command: =; space and D grep R-sq *log | awk -F'[=\sD]' '$5>0.1{print}'

awk shell forum perl • 1.1k views
ADD COMMENTlink modified 9 months ago • written 11 months ago by Shicheng Guo7.7k

Some of these commands are very specific., and the descriptions don’t really explain what they do or why they are useful.

If you wish to post useful commands, I would suggest contributing to one of the existing threads for example here. These kinds of compendiums work best when they are not spread out all over the place.

Also, why have you tagged Perl?

ADD REPLYlink modified 11 months ago • written 11 months ago by jrj.healey13k

Hi Shicheng Guo,

These commands are potentially useful, but as they lack information (and are for very specific use cases) users will not find your post when they need it. Perhaps you should consider getting your own blog (e.g. Wordpress) and explain these commands in more detail in a series of post. Awk in bioinformatics already sounds like a good title, perhaps "unix commands" in bioinformatics would broaden your scope.

You labelled this as a "Forum" - which is generally a post type for "a topic for discussion for which no definite answers exist". It's not exactly a "Tutorial" either, since you are not really teaching anything, just showing a couple of commands. It is for example entirely unclear what your sixth command does and why anyone would use it.

We value all contributions to biostars, but right now, this mostly looks like you are trying to get some upvotes.


ADD REPLYlink written 11 months ago by WouterDeCoster40k

Yes. I just record it for myself. and I am happy if it helps others. if not, I am sure it is no harm to others, right?

ADD REPLYlink modified 11 months ago • written 11 months ago by Shicheng Guo7.7k
gravatar for Batu
10 months ago by
Batu160 wrote:

I can refer this link that includes more examples and other cases: Useful bash one-liners for bioinformatics.

ADD COMMENTlink written 10 months ago by Batu160
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 738 users visited in the last hour