Question: 454Newblermetrics.Txt Format
2
gravatar for Yannick Wurm
8.8 years ago by
Yannick Wurm2.3k
Queen Mary University London
Yannick Wurm2.3k wrote:

Assembly with newbler gives a summary file, 454NewblerMetrics.txt that according to documentation is in "454 parser file" format. It looks like a simple hash structure. If I want to write a parser for this, do I need to do it from scratch? Or does this format already have a real name?

/***************************************************************************
**
**      454 Life Sciences Corporation
**         Newbler Metrics Results
**
**      Date of Assembly: 2010/10/20 14:07:53
**      Project Directory: /home/dee/keller/UHTS/ywurm/2010-09-25-littleB/results/2010-10-12-newblerAssemblies/withoutIllumina/P_2010_10_14_09_12_45_runAssembly
**      Software Release: 2.3  (091027_1459)
**
***************************************************************************/

/*
**  Input information.
*/

runData
{
    file
    {
        path = "/home/dee/keller/UHTS/454/littleB/shotgun/roche/C1172SID11098_Z8_reg2_sff/FW1LDDZ02.sff";

        numberOfReads = 537847, 537843;
        numberOfBases = 173640497, 172588857;
    }
[…]
}

pairedReadData
{
    file
    {
        path = "/home/dee/keller/UHTS/454/littleB/paired/20kb/roche/PairedEnd_RunID26803066_sff_halfplate/FX0RNLM01.sff";

        numberOfReads = 602130, 878875;
        numberOfBases = 163374476, 142729366;
        numWithPairedRead = 286117;
    }
[…]
}

/*
**  Operation metrics.
*/

runMetrics
{
    totalNumberOfReads = 16521360; 
    totalNumberOfBases = 4540313420; 

    numberSearches   = 8409112;
    seedHitsFound    = 1847363485, 219.69;
    overlapsFound    = 1834648575, 218.17, 99.31%;
    overlapsReported = 841507634, 100.07, 45.87%;
    overlapsUsed     = 18834953, 2.24, 2.24%;
}

readAlignmentResults
{
    file
    {
        path = "/home/dee/keller/UHTS/454/littleB/shotgun/roche/C1172SID11098_Z8_reg2_sff/FW1LDDZ02.sff";

        numAlignedReads     = 409425, 76.12%;
        numAlignedBases     = 142627063, 82.64%;
        inferredReadError  = 1.20%, 1707897;
    }
[…]
}

pairedReadResults
{
    file
    {
        path = "/home/dee/keller/UHTS/454/littleB/paired/20kb/roche/PairedEnd_RunID26803066_sff_halfplate/FX0RNLM01.sff";

        numAlignedReads     = 327088, 37.22%;
        numAlignedBases     = 57601622, 40.36%;
        inferredReadError  = 1.65%, 947986;

        numberWithBothMapped  = 78632;
        numWithOneUnmapped    = 38015;
        numWithMultiplyMapped = 167737;
        numWithBothUnmapped   = 1733;
    }
[…]}

/*
** Consensus distribution information.
*/
consensusDistribution
{
    fullDistribution
    {
        signalBin =  0.0, 7517321;
[…]
}


/*
**  Alignment depths.
*/
alignmentDepths
{
          1 = 7175292;
[…]
    peakDepth           = 8.0;
    estimatedGenomeSize = "567.1 MB";
}

/*
**  Consensus results.
*/
consensusResults
{
    readStatus
    {
        numAlignedReads    = 11606683, 70.25%;
        numAlignedBases    = 3617704329, 79.68%;
        inferredReadError = 1.06%, 38389865;

        numberAssembled = 9954740;
        numberPartial   = 1651943;
        numberSingleton = 858542;
        numberRepeat    = 3751116;
        numberOutlier   = 305019;
        numberTooShort  = 0;
    }

    pairedReadStatus
    {
        numberWithBothMapped   = 1239514;
        numberWithOneUnmapped  = 324133;
        numberMultiplyMapped   = 855454;
        numberWithBothUnmapped = 14981;

        library
        {
            libraryName     = "FX0RNLM01.sff";
            pairDistanceAvg = 3078.3;
            pairDistanceDev = 769.6;
        }

[…]
    }

    scaffoldMetrics
    {
        numberOfScaffolds   = 14940;
        numberOfBases       = 344205862;

        avgScaffoldSize     = 23039;
        N50ScaffoldSize     = 241728;
        largestScaffoldSize = 2015989;
    }

    largeContigMetrics
    {
        numberOfContigs   = 108123;
        numberOfBases     = 336075598;

        avgContigSize     = 3108;
        N50ContigSize     = 5423;
        largestContigSize = 79674;

        Q40PlusBases      = 327642977, 97.49%;
        Q39MinusBases     = 8432621, 2.51%;
    }

    allContigMetrics
    {
        numberOfContigs = 145244;
        numberOfBases   = 346306838;
    }
}
assembly • 2.0k views
ADD COMMENTlink written 8.8 years ago by Yannick Wurm2.3k
2
gravatar for Daniel Standage
8.8 years ago by
Daniel Standage3.9k
Davis, California, USA
Daniel Standage3.9k wrote:

I wrote a parser for these types of files about a year ago (can share code if you would like). The nice thing is that all of the metrics files are in the same format so I only needed to write a single parser.

I wrote the parser as part of a 454 sample submission and tracking system. Our sysadmins, who are now maintaining the system, required that I build the system using a PHP web framework called symfony (great system btw). So the parser I wrote is also in PHP. If you're comfortable with PHP, I can send the code. If not, it shouldn't be too hard to recreate in Perl or something else. I basically loaded everything into an XML parser, using a stack data structure to keep track of the current element (popping whenever I met a }). I then used Xpath to query the file for the data I was interested in.

Is there something better for this? I don't know. I remember doing a little searching before I wrote this class, but I don't remember finding much. If you do find something let us know.

ADD COMMENTlink written 8.8 years ago by Daniel Standage3.9k

Thanks Daniel, I agree that it should be relatively straightforward to create a parser from scratch. Thanks for the offer - my weapon of choice is ruby, so I'll keep you posted on that :)

ADD REPLYlink written 8.8 years ago by Yannick Wurm2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1305 users visited in the last hour