Question: To get number of hits from blastp output file
0
gravatar for fec2
18 months ago by
fec230
fec230 wrote:

Hi all,

I have multiple blastp output (format 6) in a directory, I wish to calculate the number of hits with sequence identity of more than 40% for each output file, therefore, I have tried:

for i in *.tsv; do awk '$3>=40' $i | wc -l; done

However, this command only give me a list of number in the terminal without matching it with the blastp output, any modification that I can do so that I know the number belongs to which blastp output file? Thank you.

genome • 425 views
ADD COMMENTlink modified 18 months ago by AK1.9k • written 18 months ago by fec230
3
gravatar for AK
18 months ago by
AK1.9k
AK1.9k wrote:

Hi fec2,

Try:

for i in *.tsv; do echo -ne "${i}\t" && awk '$3>=40{print $2}' ${i} | sort -u | wc -l; done

It should returns something like:

blast_out1.tsv  217
blast_out2.tsv  172
blast_out3.tsv  215
ADD COMMENTlink modified 18 months ago • written 18 months ago by AK1.9k

Thank you very much! Is it possible to get the result in an output file?

ADD REPLYlink written 18 months ago by fec230

You're welcome. Try this:

(for i in *.tsv; do echo -ne "${i}\t" && awk '$3>=40{print $2}' ${i} | sort -u | wc -l; done) > output.txt
ADD REPLYlink written 18 months ago by AK1.9k

The command working well, thank you again!

ADD REPLYlink written 18 months ago by fec230

Hi fec2,

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink modified 18 months ago • written 18 months ago by lieven.sterck9.1k
1

Hi thanks for your comment, I have accepted the answer. Thank you.

ADD REPLYlink modified 18 months ago • written 18 months ago by fec230

Hi,

May I know where can I find the manual for the meaning of all these command? I am new in this field and but have no clue where to find all these information. Really appreciate your help.

ADD REPLYlink written 18 months ago by fec230
1

Hello fec2,

You can use man echo, man awk, man sort, and man wc. I'd recommend "Ch3. Remedial Unix Shell" and "Ch7. Unix Data Tools" in the book: Bioinformatics Data Skills by Vince Buffalo. :-)

ADD REPLYlink modified 18 months ago • written 18 months ago by AK1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1688 users visited in the last hour