Question: Flawed Sam Header Regex?
0
gravatar for Martin A Hansen
7.7 years ago by
Martin A Hansen3.0k
Denmark
Martin A Hansen3.0k wrote:

According to the SAM format white paper (http://samtools.sourceforge.net/SAM1.pdf) header lines should be matched by /^@[A-Za-z][A-Za-z](\t[A-Za-z][A-Za-z0-9]:[ -~])+$/ or /^@CO\t.*/. However, this does not seem to be the case with the example in the format white paper - or with real world data.

ruby -e 'puts "@HD\tVN:1.3\tSO:coordinate" =~ /^@[A-Za-z][A-Za-z](\t[A-Za-z][A-Za-z0-9]:[ -~])+$/'

So is the SAM header or the regex flawed?

Cheers

Martin

format sam • 1.0k views
ADD COMMENTlink modified 7.6 years ago by brentp22k • written 7.7 years ago by Martin A Hansen3.0k
2
gravatar for Michael Barton
7.7 years ago by
Michael Barton1.8k
Akron, Ohio, United States
Michael Barton1.8k wrote:

I use rubular for testing regexes. You could add your test SAM string and then play with the regex to get the correct match?

ADD COMMENTlink written 7.7 years ago by Michael Barton1.8k

Hey, that is pretty cool!

ADD REPLYlink written 7.7 years ago by Martin A Hansen3.0k
1
gravatar for brentp
7.7 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

Hm, looks like the regex is flawed. Whereas, it's currently:

/^@[A-Za-z][A-Za-z](\t[A-Za-z][A-Za-z0-9]:[ -~])+$/

It seems it should be:

/^@[A-Za-z][A-Za-z](\t[A-Za-z][A-Za-z0-9]:[ -~]+)+$/

Where the extra '+' allows more than 1 character following the :

ADD COMMENTlink written 7.7 years ago by brentp22k

Funny, to my understanding [ -~]+ allows one or more of that group of chars space, dash and tilde. How it matches '1.3' and 'coordinate' baffles me.

ADD REPLYlink written 7.7 years ago by Martin A Hansen3.0k

@masha it does look odd, but it's the same format as [A-Z] it means match any characters between (including endpoints) " " and "~". ord(" ") == 32 and ord("~") == 126

ADD REPLYlink written 7.7 years ago by brentp22k

Of cause. I submitted the question to the samtools mailing list. Waiting for an answer.

ADD REPLYlink written 7.7 years ago by Martin A Hansen3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 713 users visited in the last hour