Question: Empty output from Picard's EstimateLibraryComplexity
1
gravatar for rubic
2.6 years ago by
rubic180
United States
rubic180 wrote:

Hi,

I'm running Picard's EstimateLibraryComplexity on 12 bam files, that are pretty shallow (~400000 reads per file), no other arguments except for I and O, and am getting no output other the standard output messages.

Note that I do find duplicates in these data. For example this is the standard output of Picard for one sample:

INFO    2016-09-26 18:20:00 MarkDuplicates  Start of doWork freeMemory: 2046635632; totalMemory: 2058354688; maxMemory: 28478275584

INFO    2016-09-26 18:20:00 MarkDuplicates  Reading input file and constructing read end information.

INFO    2016-09-26 18:20:00 MarkDuplicates  Will retain up to 113009030 data points before spilling to disk.

WARNING 2016-09-26 18:20:02 AbstractDuplicateFindingAlgorithm   Default READ_NAME_REGEX '[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*' did not match read name '534371-1'.  You may need to specify a READ_NAME_REGEX in order to correctly identify optical duplicates.  Note that this message will not be emitted again even if other read names do not match the regex.

INFO    2016-09-26 18:20:16 MarkDuplicates  Read 129293 records. 0 pairs never matched.

INFO    2016-09-26 18:20:19 MarkDuplicates  After buildSortedReadEndLists freeMemory: 1967321816; totalMemory: 2884632576; maxMemory: 28478275584

INFO    2016-09-26 18:20:19 MarkDuplicates  Will retain up to 889946112 duplicate indices before spilling to disk.

INFO    2016-09-26 18:23:23 MarkDuplicates  Traversing read pair information and detecting duplicates.

INFO    2016-09-26 18:23:23 MarkDuplicates  Traversing fragment information and detecting duplicates.

INFO    2016-09-26 18:23:23 MarkDuplicates  Sorting list of duplicate records.

INFO    2016-09-26 18:23:26 MarkDuplicates  After generateDuplicateIndexes freeMemory: 3237064784; totalMemory: 10367795200; maxMemory: 28478275584

INFO    2016-09-26 18:23:26 MarkDuplicates  Marking 91622 records as duplicates.

INFO    2016-09-26 18:23:26 MarkDuplicates  Found 0 optical duplicate clusters.

INFO    2016-09-26 18:23:36 MarkDuplicates  Before output close freeMemory: 10352402680; totalMemory: 10367795200; maxMemory: 28478275584

INFO    2016-09-26 18:23:37 MarkDuplicates  After output close freeMemory: 10352475912; totalMemory: 10367795200; maxMemory: 28478275584

But then this is the stard output of EstimateLibraryComplexity of the same sample:

INFO    2016-09-26 18:23:38 EstimateLibraryComplexity   Will store 46230966 read pairs in memory before sorting.

INFO    2016-09-26 18:23:46 EstimateLibraryComplexity   Finished reading - moving on to scanning for duplicates.

[Mon Sep 26 18:23:46 EDT 2016] picard.sam.EstimateLibraryComplexity done. Elapsed time: 0.12 minutes.

Anyone ever experienced that?

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by rubic180
1
gravatar for harold.smith.tarheel
2.6 years ago by
United States
harold.smith.tarheel4.3k wrote:

Although unlikely, I suppose it's a formal possibility at low coverage that none of the reads are duplicated. If so, then library complexity cannot be estimated (it's based on the degree of duplication). You can check by running MarkDuplicates to see if any are present.

ADD COMMENTlink written 2.6 years ago by harold.smith.tarheel4.3k

Could be a PCR-free library prep?

ADD REPLYlink written 2.6 years ago by WouterDeCoster38k

It's a selction for short RNAs (miRs) but MarkDuplicates reports 91622 records as duplicates

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by rubic180

Have the data been trimmed to a length typical for miRs (~30bp)? EstimateLibraryComplexity matches the first 50bp to identify duplicates. It may not work if the read lengths are shorter (although I don't know for sure).

But it's unclear why you need this metric, since MarkDuplicates indicates that you're near saturation - 70% (91662/129293) of the reads are duplicates.

ADD REPLYlink written 2.6 years ago by harold.smith.tarheel4.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 832 users visited in the last hour