Checkm results differ from database info
1
0
Entering edit mode
7 weeks ago

I am trying to use Checkm to assess contamination and completion levels of some SAGs. The results produced from Checkm by using a custom HMM marker file is completely different from the ones that were calculated at the sequencing center. My advisor who previously used the same data says she validated the values before so they should be the same. What can I do to solve this problem?

Checkm • 783 views
ADD COMMENT
0
Entering edit mode

Impossible to answer with the little information you provided. It's like saying that your soufflé didn't rise, and we are supposed to guess which one out of many reasons is the culprit.

It could be as simple as you doing something wrong. Or the program versions are different between your and the sequencing center. Or you are not using the same set of HMM markers. Or the SAG files got mixed up.

I think your adviser should either guide you better, or maybe they can replicate the results given the previous experience. How can strangers on the internet who are given scarce information do a better job than a living person who has already done this?

ADD REPLY
0
Entering edit mode
21 days ago
Kevin Blighe ★ 90k

Nothing is impossible, Mensur, if you simply open your mind...

alevbozan18, it sounds frustrating. Here's a quick troubleshooting plan to align your results with the sequencing center's (and your advisor's validated ones):

1. Verify CheckM Version & Setup

  • Run checkm lineage_wf --version to confirm you're using the exact same CheckM version as the center/advisor (e.g., v1.0.12 or later). Versions update marker sets and can change scores.
  • If mismatched, install the matching version via conda: conda install -c bioconda checkm=<version>.

2. Test with Default Markers First

  • Re-run CheckM without the custom HMM to use the built-in markers:
    checkm lineage_wf -t <threads> -x fa <input_dir> <output_dir>
  • Compare these to the center's results. If they match, the issue is your custom HMM. If not, check input SAG files (e.g., are they identical FASTA assemblies? Use md5sum on files to verify).

3. Audit the Custom HMM File

  • Custom HMMs are lineage-specific—ensure it's built correctly for your SAGs' taxonomy (e.g., via checkm find on a reference genome).
  • Rebuild it if needed:
    checkm find -t <threads> -x fa -r <ref_genome_dir> <hmm_out_dir>
    Then: checkm hmmsearch -t <threads> -x fa <input_dir> <hmm_out_dir> <output_dir>
  • Check for completeness: The HMM should cover ~80-90% of markers for your lineage (inspect the .hmmout file for missing hits).

4. Compare Run Parameters

  • Ask your advisor/center for their exact command line (including flags like --pangenome or contamination thresholds). Subtle diffs (e.g., --tab_table output format) can skew interpretations.
  • Enable verbose logging: Add --debug to your run and compare logs for errors in marker detection.

5. Quick Sanity Checks

  • Run on a single SAG: Isolate one bin and compare outputs side-by-side.
  • Visualize: Use CheckM's qa mode (checkm qa <output_dir>) to plot distributions—spot if contamination is overcalled due to fragmented contigs.

If these don't resolve it, share your command/output snippets on Biostars for community input (or ping me with details). Often, it's a sneaky file path or version hiccup.

Kevin

ADD COMMENT
0
Entering edit mode

Nothing is impossible, Mensur, if you simply open your mind...

Not sure why I am mentioned here. Was something wrong with my conclusion that it was impossible to answer the query as stated? Or was something wrong with the potential explanations I offered? You repeated every single one of them, albeit with more details.

I applaud your willingness to hold this user's hand until they solve the problem. I was trying to be helpful as well, but not everyone can match the level of enthusiasm with which you have recently addressed the problems on Biostars.

ADD REPLY
0
Entering edit mode

I understand - apologies. We all approach questions and helping others in different ways.

ADD REPLY

Login before adding your answer.

Traffic: 3378 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6