Question: Is there an authoritative source for optional BAM tags?
1
gravatar for John
2.6 years ago by
John12k
Germany
John12k wrote:

As the SAM/BAM spec says:

Note that tags starting with 'X', 'Y' and 'Z' or tags containing lowercase letters in either position are reserved for local use and will not be formally defined in any future version of this specification.

These optional tags are used by all sorts of aligners and downstream programs. Some of them are so prevalent (like XM) that they are just as well known as the official tags.

After some interesting discussion here, I am thinking it would be pretty neat to have an "optical duplicate" tag, and/or a PCR duplicate and biological duplicate tag, to differentiate between the three. Currently the flag 010000000000 is being used for duplicates, but it doesn't differentiate between the three.

So before I modify Anna's script (from the above thread) to tag reads rather than delete them, I'm wondering if there is a list of know or common user-tags out there that I can check against, so i choose a new one not an existing one. Probably I would choose XO (optical), XP (PCR), XB (Biological) -- but one or all might already be taken! :)

ADD COMMENTlink modified 6 days ago by Ram17k • written 2.6 years ago by John12k
2

There's definitely no authoritative source for the custom tags. If you really want to make sure you're not using a tag anyone else is then you should be pretty safe with lower case tags. I almost never see those.

BTW, I think bwa uses XO for something (no clue if it's bwa mem or bwa aln).

ADD REPLYlink modified 8 days ago by Ram17k • written 2.6 years ago by Devon Ryan84k
1

Tags starting with X, Y, and Z are fair game. If you want to write software that is stable, robust, compatible, and future-proof... do not use those flags. Do not generate or parse them (by default). If you do, you will end up with brittle software that is version-specific and cannot be switched to an alternative program.

Internally, feel free to use any XYZ tag for anything you want. That's the whole point - to allow internal custom use without changing the API. Anyone who requires a custom flag on a standard format, for externally-accessible software... is doing it wrong. If it's really that crucial, they need to talk to the standards committee and make it a standard flag.

Making observations into de-facto-official standards destroys standards.

ADD REPLYlink modified 8 days ago by Ram17k • written 2.6 years ago by Brian Bushnell15k

That makes a lot of sense - particularly, as you say, I can't control who else wants to use the same tags I use. A new mapper might come out that uses all the tags I use, and now we're incompatible. I suppose being the author of BBMap, you know all about these issues more than anyone.

Having said that, I always saw the tagging system as a way to improve upon the standard, rather than to only be used internally. I guess it all comes down to the fact that there is no authoritative source for tags, or description of what they are and what they should be used for. Perhaps if there was, the standard could be extended reliably.

Personally, I really wish there was an "explain sam flags" for tags, even if it wasn't authoritative.

ADD REPLYlink modified 8 days ago by Ram17k • written 2.6 years ago by John12k
1

BBMap has various custom tags, but I don't use them as interfaces. They display internal state, rather than sending information to the next process in the pipeline. It takes a huge amount of effort to ensure your software is compliant with "popular" tags (and the general case is impossible, since they can conflict or be insufficiently specified); ensuring compliance with official tags is already difficult enough!

It's a valid use to develop internal pipelines that use "sam" files which require specific unofficial fields that are created by your internal software. But, it is bad practice to publish and promote such things externally, as it fragments the standard.

ADD REPLYlink modified 8 days ago by Ram17k • written 2.6 years ago by Brian Bushnell15k
1
gravatar for John
2.6 years ago by
John12k
Germany
John12k wrote:

I just read this, and I don't quite know how I missed it so many times before, but:

You can freely add new tags, and if a new tag may be of general interest, you can email samtools-devel@lists.sourceforge.net to add the new tag to the specification. Note that tags starting with ‘X’, ‘Y’ and ‘Z’ or tags containing lowercase letters in either position are reserved for local use and will not be formally defined in any future version of this specification.

So maybe my question, as Brian points out, is wrong on principle. Maybe the question should be "why aren't popular mapping tools ensuring their X/Y/Z tags are put into the SAM spec!" :)

I suppose because it has to be reserved retro-actively...?

ADD COMMENTlink modified 8 days ago by Ram17k • written 2.6 years ago by John12k
1

You can also just make a PR on the hts-specs repo on github, it has largely the same effect and I get the feeling that most of the people on the samtools-devel list follow the repo.

I've seen a few requests to get tags added over the last year or two, but for the most part tags end up being really particular to a specific tool or workflow so it's hard to argue that they're general enough to get added to the spec.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Devon Ryan84k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 750 users visited in the last hour