I'm having trouble removing duplicates using Picard tools on SOLiD data. I get a regex not matching error.
The reads have the following names:
22_758_632_F3 604_1497_576 124_1189_1519_F5 358_1875_702_F5-DNA
And I don't think Picard tools is able to pick these read names with its default regex.
I tried to change the default regex. This time it does not throw an error, but it takes too long and times out (out of memory). I suspect I'm not giving the right regex. Here is my command:
java -jar $PICARD_TOOLS_HOME/MarkDuplicates.jar I=$FILE O=$BAMs/MarkDuplicates/$SAMPLE.MD.bam M=$BAMs/MarkDuplicates/$SAMPLE.metrics READ_NAME_REGEX="([0-9]+)_([0-9]+)_([0-9]+).*"
Any help is appreciated. Thanks!