I'm trying to get some info about the top domains that my data matched to using hmmscan...I'm using the domain table output that is packaged with hmmscan (I also have the PFAM file). I want to use some sort of script to go through and tell me how many matches to each unique domain there are and then sort that list so that I can extract the top 15-20 domains that my data matched to. Thoughts?
Question: Count Hits to Each Unique Domain in Hmmscan Results w/ Python?
6.4 years ago by
ethanabaker1 • 0
ethanabaker1 • 0 wrote:
ADD COMMENT • link •
2.9 years ago by
John • 0
John • 0 wrote:
One option is to load your hmmscan table results as a pandas.DataFrame, then counting the domains is easy with the
# Here, `hmm_tbl` is a pandas.DataFrame with your hmmscan results. # This dataframe has a column `accession_target`, which is the accession of the result in the table. # Eg, the dataframe might look something like this (truncated for readability) # acc accession_query accession_target ali_coord_from env_coord_from # 0 0.91 NaN PF00005.25 10 6 # 1 0.89 NaN PF13304.4 33 21 hmm_tbl.accession_target.value_counts() # PF07719.15 4668 # PF00005.25 4402 # PF13304.4 3626 # PF13432.4 3601 # PF13428.4 3513 # PF00515.26 3494 # ...
Please log in to add an answer.
Powered by Biostar version 2.3.0
Traffic: 2048 users visited in the last hour