Hi everyone,
I am using the EstimateLibraryComplexity utility from Picard Tools to calculate the complexity of my paired-end RNAseq libraries.
This is my command line:
java -jar /picard/EstimateLibraryComplexity INPUT=sample.bam OUTPUT=sample_libcomp.txt VERBOSITY=ERROR VALIDATION_STRINGENCY=SILENT
This generates a sample_libcomp.txt
file. This is the truncated output:
## HISTOGRAM java.lang.Integer
duplication_group_count P01311
1 23739815
2 3633946
3 870509
4 426481
5 202751
6 171461
7 93221
8 83632
9 58171
10 50066
11 34938
12 36788
13 24277
14 24100
15 19388
16 18345
17 13640
18 14480
...
456 1
457 1
458 1
459 1
460 2
464 3
468 1
470 2
471 2
473 1
477 2
480 1
484 1
488 1
Can anyone explain to me what these values mean? I couldn't find an explanation of the output anywhere. I plan to plot these values as a density histogram (maybe convert the values to log2). So I really need to understand what these values are in order to interpret the histogram that I will create later.
Thanks!