How is 'ESTIMATED_LIBRARY_SIZE' computed in picard tools?
1
0
Entering edit mode
7.2 years ago

Hi,

I used picard tools to mark duplicates in my paired RNAseq reads. Now, I am trying to understand the metrics obtained from the tool.

## METRICS CLASS    picard.sam.DuplicationMetrics
LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED SECONDARY_OR_SUPPLEMENTARY_RDS  UNMAPPED_READS  UNPAIRED_READ_DUPLICATES    READ_PAIR_DUPLICATES    READ_PAIR_OPTICAL_DUPLICATES    PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
Unknown Library 6874536 70195365    21943811    13162992    6463873 28645311    30223   0.432923    60563946

Even after reading the documentation about MarkDuplicates and EstimateLibraryComplexity, I cannot recalculate the laste column (ESTIMATED_LIBRARY_SIZE). Is there a formula to calculate this value? Any explanation or suggestion where to find this information would be greatly appreciated.

RNA-Seq • 4.9k views
ADD COMMENT
2
Entering edit mode
7.2 years ago

From the picard code: https://github.com/broadinstitute/picard/blob/master/src/main/java/picard/sam/DuplicationMetrics.java#L115

    /**
     * Estimates the size of a library based on the number of paired end molecules observed
     * and the number of unique pairs observed.
     *
     * Based on the Lander-Waterman equation that states:
     *     C/X = 1 - exp( -N/X )
     * where
     *     X = number of distinct molecules in library
     *     N = number of read pairs
     *     C = number of distinct fragments observed in read pairs
     */
    public static Long estimateLibrarySize(final long readPairs, final long uniqueReadPairs) {
        final long readPairDuplicates = readPairs - uniqueReadPairs;

        if (readPairs > 0 && readPairDuplicates > 0) {
long n = readPairs;
(...)
ADD COMMENT
0
Entering edit mode

Thank you for the quick response.

ADD REPLY

Login before adding your answer.

Traffic: 2477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6