How Can I Stop Jackhammer Aligning The Same Target Sequence Multiple Times?
0
0
Entering edit mode
10.8 years ago
James Ashmore ▴ 100

Hello everyone,

I am using jackhmmer to cluster a set of protein sequences into groups. I am taking my 273 sequences, removing one and using that as the query against the 272 remaining sequences. Any hits which are found from the first query are removed from the pool for subsequent queries. The alignments jackhmmer produces however are multiple hit so more than one local alignment can be found in the target sequence and be included in the multiple alignment. Ideally I would like it to be aligned only once given that the same sequence is being aligned to itself multiple times. I've included an example alignment to help explain. A lot of the protein sequences are remotely related thus I thought jackhmmer would be more sensitive than a simple blastp and muscle alignment.

Example jackhmmer alignment

Thanks, James

alignment • 3.2k views
ADD COMMENT
0
Entering edit mode

I'm no expert on Jackhmmer per se but the way you get around the problem of matching repetitive sequences in other HMMER algorithms is by tweaking the sequence-level and hit-level cut-off thresholds (see http://hmmer.janelia.org/help/search#thresh). Are you using the default cut-offs? Are you using bitscore or e-value cutoffs?

I'm interested, however, why you want to surpress this behaviour if all you are looking to do is cluster the proteins - doesn't the fact that the sequence hits (at least) once give you enough reason to add it to the cluster?

ADD REPLY
0
Entering edit mode

Hi Sarah,

Thank you for the suggestions!

I'm using the default settings which HMMER has configured for jackhmmer. The reason I would like it to only hit once is I aim to identify each cluster as a protein family. Having one sequence hit multiple times messes up the alignment of the family 'members' and thus I can't do any motif or globular domain analysis given that the alignment information content is being skewed by the multiple appearances of the same sequence, as well as being overlapped.

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6