Question: How Can I Stop Jackhammer Aligning The Same Target Sequence Multiple Times?
0
gravatar for James Ashmore
6.2 years ago by
London
James Ashmore90 wrote:

Hello everyone,

I am using jackhmmer to cluster a set of protein sequences into groups. I am taking my 273 sequences, removing one and using that as the query against the 272 remaining sequences. Any hits which are found from the first query are removed from the pool for subsequent queries. The alignments jackhmmer produces however are multiple hit so more than one local alignment can be found in the target sequence and be included in the multiple alignment. Ideally I would like it to be aligned only once given that the same sequence is being aligned to itself multiple times. I've included an example alignment to help explain. A lot of the protein sequences are remotely related thus I thought jackhmmer would be more sensitive than a simple blastp and muscle alignment.

Example jackhmmer alignment

Thanks, James

alignment • 2.1k views
ADD COMMENTlink modified 6.0 years ago by Biostar ♦♦ 20 • written 6.2 years ago by James Ashmore90

I'm no expert on Jackhmmer per se but the way you get around the problem of matching repetitive sequences in other HMMER algorithms is by tweaking the sequence-level and hit-level cut-off thresholds (see http://hmmer.janelia.org/help/search#thresh). Are you using the default cut-offs? Are you using bitscore or e-value cutoffs?

I'm interested, however, why you want to surpress this behaviour if all you are looking to do is cluster the proteins - doesn't the fact that the sequence hits (at least) once give you enough reason to add it to the cluster?

ADD REPLYlink written 6.2 years ago by sarahhunter600

Hi Sarah,

Thank you for the suggestions!

I'm using the default settings which HMMER has configured for jackhmmer. The reason I would like it to only hit once is I aim to identify each cluster as a protein family. Having one sequence hit multiple times messes up the alignment of the family 'members' and thus I can't do any motif or globular domain analysis given that the alignment information content is being skewed by the multiple appearances of the same sequence, as well as being overlapped.

Thanks

ADD REPLYlink written 6.2 years ago by James Ashmore90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1719 users visited in the last hour