Question: collecting 50 most frequent proteins from tabular blastX result
0
gravatar for Farbod
4.0 years ago by
Farbod3.3k
Toronto
Farbod3.3k wrote:

Dear Friends, Hi

I have done a blastX against NCBI nr database (using Diamond and keeping -max_target_seqs = 1) with outfmt 6.

I want to collect 50 proteins with the most frequent occurance in my results.

Is there any command line sccript or program for doing this task?

(I have tried cutting the column of the IDs and then openning it in Microsoft excel and count the duplicates and . . . but opening such file and running the duplicate count in my Windows system computer which is not very powerful is very difficult)

Thank you in advance

blast • 983 views
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Farbod3.3k
1

Perhaps this would help (see @Pierre's answer or python scripts if that is not going to help): Blastp how to find and count duplicates?..

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by GenoMax92k

Dear genomax2, Hi & thank you.

but I could not understand that what is the final correct python script ?

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Farbod3.3k
1

Simple: cut -f 1 blast_out.tbl | sort | uniq -c | sort -k1gr |head -50

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Asaf8.5k

Dear Asef, Hi

It seems that it is magically working!

Thank you

ADD REPLYlink written 4.0 years ago by Farbod3.3k
2

No magic, just simple unix command liners

ADD REPLYlink written 4.0 years ago by Asaf8.5k

Dear Asef,

it seems that your script has two sort commands in it, can we reduce it to just one ?

~ Best

ADD REPLYlink written 4.0 years ago by Farbod3.3k

Probably not. You can start at left and keep running the commands, every-time adding one more term (from the pipes) to see why not.

 cut -f 1 blast_out.tbl | less
 cut -f 1 blast_out.tbl | sort | less
 cut -f 1 blast_out.tbl | sort | uniq -c | less

You get the idea.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by GenoMax92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2084 users visited in the last hour