Platform Unit For Sam Read Group
1
3
Entering edit mode
11.7 years ago
Johan ▴ 890

I'm wondering exactly what the "Platform unit" means in the read group header in the SAM format. This is what I found in the specification:

Platform unit (e.g. flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identi er.

Since I have Illumina data, does this mean that I should use something like this: "FC706VJ.1" - assuming that the flowcell barcode is "FC706VJ" and the lane is 1. (The example names are from the FASTQ enty at wikipedia)? And If this is the case, does any one know if it is possible to extract the flowcell name from the report.xml generated Casava? I have a hunch that this is the same as the RunFolder attribute - but I might be wrong.

• 5.6k views
ADD COMMENT
5
Entering edit mode
11.7 years ago
Johan ▴ 890

Ok. I will answer my own question with what I have figured out this far. The flowcell-barcode is a unique identifier (as described in the documentation quoted above) of the flowcell. In the report.xml file it should be included as an attribute called "FlowCellId" in the MetaData tag. However, this does not always seem to be included. This is then coupled with the lane(s) that the sample was run on to form the platform unit.

ADD COMMENT

Login before adding your answer.

Traffic: 2522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6