Question

How to edit domtblout?

0

Entering edit mode

2.2 years ago

Riku ▴ 80

Hi, all.

I'm beginner for bioinformatics.

I have the results of hmmer in the format of domtblout, so I would like to edit and use it. But I don't know how to edit it because it isn't separated by tab. Could you tell me how to use domtblout file?

For example, if you would like to extract gene ID, how dou you edit it?

Best regards,

bash HMMER Pfam awk domtblout • 901 views

ADD COMMENT • link 2.2 years ago by Riku ▴ 80

score 2 · Accepted Answer · 2022-01-28

2

Entering edit mode

2.2 years ago

Shraddha ▴ 90

Do you want to edit in the file itself? Or would you just like to extract the gene IDs?

For the first, just make a copy of your file and open it with a text editor. For the second, you can use grep.

E.g., you want to grab the accessions in each search:

grep 'Accession' Pfam.dombtblout

Or you want the gene IDs that are in the alignments:

grep '>>' Pfam.domtblout

If you could be more specific with your question, then we can offer more detailed answers.

ADD COMMENT • link 2.2 years ago by Shraddha ▴ 90

0

Entering edit mode

I really appreciate for your advice. I have the domtblout file like this, so I would like to extract only the gene ID column from this table. In this case, it looks like difficult to solved this problem with grep.

#                                                                            --- full sequence --- -------------- this domain -------------   hmm coord   ali coord   env coord
# target name        accession   tlen query name           accession   qlen   E-value  score  bias   #  of  c-Evalue  i-Evalue  score  bias  from    to  from    to  from    to  acc description of target
#------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ---------------------
DHO_dh               PF01180.24   295 At_DN10000_c0_g1_i1.p1 -            412   5.1e-90  301.8   0.0   1   1   3.2e-94   6.3e-90  301.5   0.0     2   295    79   377    78   377 0.95 Dihydroorotate dehydrogenase
DHO_dh               PF01180.24   295 At_DN10000_c0_g2_i1.p1 -            412   5.1e-90  301.8   0.0   1   1   3.2e-94   6.3e-90  301.5   0.0     2   295    79   377    78   377 0.95 Dihydroorotate dehydrogenase
Ion_trans_2          PF07885.19    79 At_DN10001_c0_g2_i1.p1 -            420   2.7e-19   69.0  12.5   1   2   1.5e-05      0.29   11.2   0.3    50    76     1    27     1    30 0.89 Ion channel
Ion_trans_2          PF07885.19    79 At_DN10001_c0_g2_i1.p1 -            420   2.7e-19   69.0  12.5   2   2   3.9e-21   7.6e-17   61.2   6.7     3    77   228   299   226   300 0.90 Ion channel
Glutaredoxin         PF00462.27    60 At_DN10003_c0_g1_i7.p3 -            135   5.9e-19   68.2   0.0   1   1   4.3e-23   8.4e-19   67.7   0.0     1    60    44   109    44   109 0.96 Glutaredoxin
zf-C2H2              PF00096.29    23 At_DN10003_c0_g1_i7.p1 -            392   1.6e-24   85.1  49.3   1   6      0.47   3.1e+03   -0.9   0.0    10    23   160   174   159   174 0.86 Zinc finger, C2H2 type
zf-C2H2              PF00096.29    23 At_DN10003_c0_g1_i7.p1 -            392   1.6e-24   85.1  49.3   2   6   2.6e-07    0.0017   18.8   5.8     1    23   249   271   249   271 0.98 Zinc finger, C2H2 type

ADD REPLY • link 2.2 years ago by Riku ▴ 80

1

Entering edit mode

My domtblout files have some tabular areas, but it's not always the case. Is your entire file tabular?

If so, awk is your friend. I don't see a gene ID column, I assume you mean query name? You should be able to get that column with awk '{print $4}' Pfam.domtblout (in my case awk automatically recognised the delimiter, you might have to tweak the -F flag a bit to get it to work)