Question: Perl script required
gravatar for Rasoul Godini
8 months ago by
Rasoul Godini0 wrote:

Hi everyone,

I need an script to select specific sequences among many results of a local blast output file. Briefly, I have an blast output file containing many alignments in different frame shifts with E-value and bit scores. Each sequence has different frame shift results. Now, I want to select just the frame shifts with the highest score in each alignment and the best alignments and based bit score or E-value. No limitation for the type of file.
Anyone can introduce me a way to learn how prepare such scripts as I am new in Perl scripting, or if there is a file to be edited for my goal.
Thank you in advance.

sequence alignment • 254 views
ADD COMMENTlink modified 8 months ago by Shred120 • written 8 months ago by Rasoul Godini0

And why does it have to be Perl?

ADD REPLYlink written 8 months ago by WouterDeCoster36k

Why not put the whole result in a pandas dataframe in Python, then filter it as you want ? Not easy to create a dataframe in Perl btw

ADD REPLYlink written 8 months ago by Bastien Hervé2.8k
gravatar for Shred
8 months ago by
Shred120 wrote:

Don't need to be in perl. If you're working with tab separated values (so outfmt 6) , you can easly do a filter with awk. Assuming evalue is on the 4th column, thìs command will print only lines where evalue is lower than 0.02

awk -F'\t' '$4 < 0.02 {print ;}'

Same things could be done with every params. Remember always to declare which is the field separator, said after "-F" arguments. Field starts to be counted from $1, because $0 is the whole line.

To get just the first results, assuming your blast query reports output from most representative record to the less representative one, you can do again a trick with awk. $1 is used assuming your query id is on the 1st field.

awk '!seen[$1]++'
ADD COMMENTlink written 8 months ago by Shred120
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 644 users visited in the last hour