Interpreting Estimate Library Complexity Output
1
2
Entering edit mode
9.9 years ago
komal.rathi ★ 4.1k

Hi everyone,

I am using the EstimateLibraryComplexity utility from Picard Tools to calculate the complexity of my paired-end RNAseq libraries.

This is my command line:

java -jar /picard/EstimateLibraryComplexity INPUT=sample.bam OUTPUT=sample_libcomp.txt VERBOSITY=ERROR VALIDATION_STRINGENCY=SILENT

This generates a sample_libcomp.txt file. This is the truncated output:

## HISTOGRAM    java.lang.Integer
duplication_group_count    P01311

1       23739815
2       3633946
3       870509
4       426481
5       202751
6       171461
7       93221
8       83632
9       58171
10      50066
11      34938
12      36788
13      24277
14      24100
15      19388
16      18345
17      13640
18      14480
...
456     1
457     1
458     1
459     1
460     2
464     3
468     1
470     2
471     2
473     1
477     2
480     1
484     1
488     1

Can anyone explain to me what these values mean? I couldn't find an explanation of the output anywhere. I plan to plot these values as a density histogram (maybe convert the values to log2). So I really need to understand what these values are in order to interpret the histogram that I will create later.

Thanks!

Picard EstimateLibraryComplexity • 5.6k views
ADD COMMENT
3
Entering edit mode
9.9 years ago
Dan D 7.4k

The first column is the number of duplicates. The second column is the number of reads having the corresponding number of duplicates.

So in your output, there are 426,481 sequences which have exactly four duplicates.

ADD COMMENT

Login before adding your answer.

Traffic: 2925 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6