Trying to extract per-index statistics from MiSeq's GenerateFASTQRunStatistics.xml file with XPath.
0
0
Entering edit mode
9.2 years ago
Charles Plessy ★ 2.9k

MiSeq runs contain some report files in XML format. In particular, Data/Intensities/BaseCalls/Alignment/GenerateFASTQRunStatistics.xml indicates various numbers such as the count of reads passing filter, for the whole run and for each index separately. Here is an oversiplified example.

<StatisticsGenerateFASTQ>
  <RunStats>
    <NumberOfClustersPF>11433659</NumberOfClustersPF>
    <NumberOfClustersRaw>11969395</NumberOfClustersRaw>
  </RunStats>
  <OverallSamples>
    <SummarizedSampleStatistics>
      <NumberOfClustersPF>43181</NumberOfClustersPF>
      <NumberOfClustersRaw>49080</NumberOfClustersRaw>
      <SampleNumber>1</SampleNumber>
    </SummarizedSampleStatistics>
    <SummarizedSampleStatistics>
      <NumberOfClustersPF>79129</NumberOfClustersPF>
      <NumberOfClustersRaw>85016</NumberOfClustersRaw>
      <SampleNumber>2</SampleNumber>
    </SummarizedSampleStatistics>
  </OverallSamples>
  <PairedEndByGenome />
  <Samples />
</StatisticsGenerateFASTQ>

Using the XmlStarlet comand-line tool, I could extract the total number of clusters passing filter with the command xmlstarlet sel -t -v //RunStats/NumberOfClustersPF GenerateFASTQRunStatistics.xml. However, to do so for a given index, one needs to first identify which SummarizedSampleStatistics node contains a SampleNumber child that contains a given index number, and then extract the value from the NumberOfClustersPF sibling node.

Does anybody know an XPath expression doing that ? (so that it could become a one-liner with XmlStarlet). I tried things like //ancestor::SampleNumber[.="2"]/NumberOfClustersPF without success...

XML MiSeq • 1.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6