Question: Nucleotide Match With Unix Tools
4
gravatar for Michael Schubert
9.0 years ago by
Cambridge, UK
Michael Schubert6.9k wrote:

Today one assignment in a course I'm doing was finding all (CCG)*4 repeats in the Q arm of human chromosome 11. Since I'm having a little too much time on my hands after having done it with EMBOSS's fuzznuc I wanted to try a bash-only version an came up with

cat 11q.fa | sed '1d' | tr -d '\n' | tr -d '\r' | egrep -io '(CCG){4}' | wc -l

fuzznuc came up with 18 matches, the bash version only with 11. Neither does take into account reverse complement matches by default.

I suppose fuzznuc is correct, but can anyone spot an error in the bash version?

(edit: it's called fuzznuc, not fuzzynuc)

fasta • 1.4k views
ADD COMMENTlink modified 9.0 years ago by brentp23k • written 9.0 years ago by Michael Schubert6.9k

give us the data and parameters to fuzzynuk and the output, then maybe if somebody has way too much time we'll figure something out ;)

ADD REPLYlink written 9.0 years ago by Michael Dondrup46k
7
gravatar for brentp
9.0 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

I'm not familiar with fuzzynuc, but it probably finds overlapping matches whereas most linux tools will not find overlapping matches (so if you have CCGCCGCCGCCGCCG that's 5 triplets and you actually have 2 distinct (CCG){4}'s but the RE engine will only find the first.

You could sorta check this by seeing how many matches you find with '(CCG){5,}'as the regular expression--though that will similarly underestimate if there are 6+ CCG triplets together.

ADD COMMENTlink written 9.0 years ago by brentp23k

I checked it right now, that was indeed the case. Thanks!

ADD REPLYlink written 9.0 years ago by Michael Schubert6.9k

sounds good! But then it's kind of fuzzy which result is correct

ADD REPLYlink written 9.0 years ago by Michael Dondrup46k

Yes it is. But it is good to know how the program behaves in case I need it again :-)

ADD REPLYlink written 9.0 years ago by Michael Schubert6.9k
2
gravatar for Michael Dondrup
9.0 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

Well, I don't know fuzzynuc, so I dont have the slightest glimpse, though I dare to make a guess anyway. fuzzynuc, that rings a bell. Could it be that it does fuzzy matching, while grep does exact matching? To lazy to try to figure it out myself though, because I dont have your data.

ADD COMMENTlink written 9.0 years ago by Michael Dondrup46k

fuzznuc also prints out a matchtable which does only show exact matches, not approximate ones.

ADD REPLYlink written 9.0 years ago by Michael Schubert6.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 796 users visited in the last hour