Perl script required
1
0
Entering edit mode
6.0 years ago

Hi everyone,

I need an script to select specific sequences among many results of a local blast output file. Briefly, I have an blast output file containing many alignments in different frame shifts with E-value and bit scores. Each sequence has different frame shift results. Now, I want to select just the frame shifts with the highest score in each alignment and the best alignments and based bit score or E-value. No limitation for the type of file.
Anyone can introduce me a way to learn how prepare such scripts as I am new in Perl scripting, or if there is a file to be edited for my goal.
Thank you in advance.

alignment sequence • 1.1k views
ADD COMMENT
1
Entering edit mode

And why does it have to be Perl?

ADD REPLY
0
Entering edit mode

Why not put the whole result in a pandas dataframe in Python, then filter it as you want ? Not easy to create a dataframe in Perl btw

ADD REPLY
3
Entering edit mode
5.9 years ago
Shred ★ 1.4k

Don't need to be in perl. If you're working with tab separated values (so outfmt 6) , you can easly do a filter with awk. Assuming evalue is on the 4th column, thìs command will print only lines where evalue is lower than 0.02

awk -F'\t' '$4 < 0.02 {print ;}'

Same things could be done with every params. Remember always to declare which is the field separator, said after "-F" arguments. Field starts to be counted from $1, because $0 is the whole line.

To get just the first results, assuming your blast query reports output from most representative record to the less representative one, you can do again a trick with awk. $1 is used assuming your query id is on the 1st field.

awk '!seen[$1]++'
ADD COMMENT

Login before adding your answer.

Traffic: 3208 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6