454Newblermetrics.Txt Format
1
2
Entering edit mode
13.5 years ago
Yannick Wurm ★ 2.5k

Assembly with newbler gives a summary file, 454NewblerMetrics.txt that according to documentation is in "454 parser file" format. It looks like a simple hash structure. If I want to write a parser for this, do I need to do it from scratch? Or does this format already have a real name?

/***************************************************************************
**
**      454 Life Sciences Corporation
**         Newbler Metrics Results
**
**      Date of Assembly: 2010/10/20 14:07:53
**      Project Directory: /home/dee/keller/UHTS/ywurm/2010-09-25-littleB/results/2010-10-12-newblerAssemblies/withoutIllumina/P_2010_10_14_09_12_45_runAssembly
**      Software Release: 2.3  (091027_1459)
**
***************************************************************************/

/*
**  Input information.
*/

runData
{
    file
    {
        path = "/home/dee/keller/UHTS/454/littleB/shotgun/roche/C1172SID11098_Z8_reg2_sff/FW1LDDZ02.sff";

        numberOfReads = 537847, 537843;
        numberOfBases = 173640497, 172588857;
    }
[…]
}

pairedReadData
{
    file
    {
        path = "/home/dee/keller/UHTS/454/littleB/paired/20kb/roche/PairedEnd_RunID26803066_sff_halfplate/FX0RNLM01.sff";

        numberOfReads = 602130, 878875;
        numberOfBases = 163374476, 142729366;
        numWithPairedRead = 286117;
    }
[…]
}

/*
**  Operation metrics.
*/

runMetrics
{
    totalNumberOfReads = 16521360; 
    totalNumberOfBases = 4540313420; 

    numberSearches   = 8409112;
    seedHitsFound    = 1847363485, 219.69;
    overlapsFound    = 1834648575, 218.17, 99.31%;
    overlapsReported = 841507634, 100.07, 45.87%;
    overlapsUsed     = 18834953, 2.24, 2.24%;
}

readAlignmentResults
{
    file
    {
        path = "/home/dee/keller/UHTS/454/littleB/shotgun/roche/C1172SID11098_Z8_reg2_sff/FW1LDDZ02.sff";

        numAlignedReads     = 409425, 76.12%;
        numAlignedBases     = 142627063, 82.64%;
        inferredReadError  = 1.20%, 1707897;
    }
[…]
}

pairedReadResults
{
    file
    {
        path = "/home/dee/keller/UHTS/454/littleB/paired/20kb/roche/PairedEnd_RunID26803066_sff_halfplate/FX0RNLM01.sff";

        numAlignedReads     = 327088, 37.22%;
        numAlignedBases     = 57601622, 40.36%;
        inferredReadError  = 1.65%, 947986;

        numberWithBothMapped  = 78632;
        numWithOneUnmapped    = 38015;
        numWithMultiplyMapped = 167737;
        numWithBothUnmapped   = 1733;
    }
[…]}

/*
** Consensus distribution information.
*/
consensusDistribution
{
    fullDistribution
    {
        signalBin =  0.0, 7517321;
[…]
}


/*
**  Alignment depths.
*/
alignmentDepths
{
          1 = 7175292;
[…]
    peakDepth           = 8.0;
    estimatedGenomeSize = "567.1 MB";
}

/*
**  Consensus results.
*/
consensusResults
{
    readStatus
    {
        numAlignedReads    = 11606683, 70.25%;
        numAlignedBases    = 3617704329, 79.68%;
        inferredReadError = 1.06%, 38389865;

        numberAssembled = 9954740;
        numberPartial   = 1651943;
        numberSingleton = 858542;
        numberRepeat    = 3751116;
        numberOutlier   = 305019;
        numberTooShort  = 0;
    }

    pairedReadStatus
    {
        numberWithBothMapped   = 1239514;
        numberWithOneUnmapped  = 324133;
        numberMultiplyMapped   = 855454;
        numberWithBothUnmapped = 14981;

        library
        {
            libraryName     = "FX0RNLM01.sff";
            pairDistanceAvg = 3078.3;
            pairDistanceDev = 769.6;
        }

[…]
    }

    scaffoldMetrics
    {
        numberOfScaffolds   = 14940;
        numberOfBases       = 344205862;

        avgScaffoldSize     = 23039;
        N50ScaffoldSize     = 241728;
        largestScaffoldSize = 2015989;
    }

    largeContigMetrics
    {
        numberOfContigs   = 108123;
        numberOfBases     = 336075598;

        avgContigSize     = 3108;
        N50ContigSize     = 5423;
        largestContigSize = 79674;

        Q40PlusBases      = 327642977, 97.49%;
        Q39MinusBases     = 8432621, 2.51%;
    }

    allContigMetrics
    {
        numberOfContigs = 145244;
        numberOfBases   = 346306838;
    }
}
assembly • 2.8k views
ADD COMMENT
2
Entering edit mode
13.5 years ago

I wrote a parser for these types of files about a year ago (can share code if you would like). The nice thing is that all of the metrics files are in the same format so I only needed to write a single parser.

I wrote the parser as part of a 454 sample submission and tracking system. Our sysadmins, who are now maintaining the system, required that I build the system using a PHP web framework called symfony (great system btw). So the parser I wrote is also in PHP. If you're comfortable with PHP, I can send the code. If not, it shouldn't be too hard to recreate in Perl or something else. I basically loaded everything into an XML parser, using a stack data structure to keep track of the current element (popping whenever I met a }). I then used Xpath to query the file for the data I was interested in.

Is there something better for this? I don't know. I remember doing a little searching before I wrote this class, but I don't remember finding much. If you do find something let us know.

ADD COMMENT
0
Entering edit mode

Thanks Daniel, I agree that it should be relatively straightforward to create a parser from scratch. Thanks for the offer - my weapon of choice is ruby, so I'll keep you posted on that :)

ADD REPLY

Login before adding your answer.

Traffic: 1965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6