Question

Proper regex to mark duplicates using Picard tools on SOLiD data

1

Entering edit mode

9.4 years ago

Jordan ★ 1.3k

Hi,

I'm having trouble removing duplicates using Picard tools on SOLiD data. I get a regex not matching error.

The reads have the following names:

22_758_632_F3

604_1497_576

124_1189_1519_F5

358_1875_702_F5-DNA

And I don't think Picard tools is able to pick these read names with its default regex.

I tried to change the default regex. This time it does not throw an error, but it takes too long and times out (out of memory). I suspect I'm not giving the right regex. Here is my command:

java -jar $PICARD_TOOLS_HOME/MarkDuplicates.jar I=$FILE O=$BAMs/MarkDuplicates/$SAMPLE.MD.bam M=$BAMs/MarkDuplicates/$SAMPLE.metrics READ_NAME_REGEX="([0-9]+)_([0-9]+)_([0-9]+).*"

Any help is appreciated. Thanks!

duplicate-removal picard-tools SOLiD BAMs NGS • 2.9k views

ADD COMMENT • link 9.4 years ago by Jordan ★ 1.3k

score 1 · Accepted Answer · 2014-12-11

1

Entering edit mode

9.4 years ago

Jordan ★ 1.3k

I was able to fix the issue, by adding -Xmx16g and increasing the RAM size. Apparently the RAM was not sufficient.

ADD COMMENT • link 9.4 years ago by Jordan ★ 1.3k