Hi there,
I was wondering if you could help me. So I have done a multi-sequence blastp search which has generated an output.tsv file. I need to separate that tsv file into separate files containing the hits for each protein search. So for protein 1 have all the information of the hits into one file. And then protein 2 into a separate file. I tried to do this by limiting the target sequences to 10 and then splitting it by line number. So 10 hits go in each file but there are some proteins with 3 or 4 hits so then it messes up the separation. And I have to do this on Python !
I am in dire need of some help!
I know you can parse a blast out file but how would I then direct all the hits for each protein into a different file.
Any help would be really appreciated!! Thank you
Providing an outline for a non-python solution.
cut
the first column out and thenuniq
that list to get sequence ID's that have hits. Then usegrep
in a loop with-w
option to extract lines that contain that ID.Is this an assignment?
No it's not an assignment. But I'm working with someone who only uses python. I could do it if I didn't have to use python but using python is confusing me slightly... Thank you though for your input.
Is this standard
-outfmt 6
tabular output? Are you python savvy or no?It may be something like this (something I found on web) :
OR
https://www.reddit.com/r/bioinformatics/comments/4ef5p8/how_to_filter_blast_results_using_biopython/
Yeah maybe. I will have a look into that. I usually use pandas to make a dataframe to make plots and things.
Or I was thinking something like this :
Output would look like:
But I don't know how to all of that direct that into a file... I'll have a think...
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thank you!
And to add to this, as soon as you have it in a DataFrame you could use something like the following loop (untested):