This is a part of my file. You can see the output for KI0314_NODE_20043_length_7522_cov_1.691954_4.
Glyco_hydro_43 PF04616.18 KI0314_NODE_20043_length_7522_cov_1.691954_4 - 2.3e-43 148.8 4.0 3.5e-43 148.2 4.0 1.3 1 0 0 1 1 1 1 Glycosyl hydrolases family 43 GH43_C2 PF17851.5 KI0314_NODE_20043_length_7522_cov_1.691954_4 - 2.8e-31 109.0 0.5 4.5e-31 108.3 0.5 1.3 1 0 0 1 1 1 1 Beta xylosidase C-terminal Concanavalin A-like domain Cellulase PF00150.22 KI0314_NODE_20043_length_7522_cov_1.691954_4 - 5.6e-16 59.0 4.1 1.1e-15 58.1 4.1 1.4 1 0 0 1 1 1 1 Cellulase (glycosyl hydrolase family 5)
In this file I need to remove all enzymes are not a cellulase. So I need to delete this part of file. Do you know some tools which I can use for this instead writing a long script?
This sounds like something a little too niche for a dedicated tool to be made for it. But this sounds like a pretty simple task to do in
R
that wouldn't require a long script. You could use thesubset
function withgrepl
where the regexpattern
is a string of enzymes you want to remove. If the hmmer output file is particularly large, you could also use thedata.table
package to speed up reading data in.