I ran several bam files through a pipeline with CleanSam, SortSam, and MarkDuplicates without a problem.
However, one of the input files gave me the following error with CleanSam:
ERROR: Record 2106053, Read name A00187:414:HMYCYDSXY:3:1426:13367:11083, Alignment start (21157039) must be <= reference sequence length (21154825) on reference 7
Because all of the bam files were generated from libraries from the same dataset using the same pipeline and aligned/mapped to the same reference genome, I'm having difficulty knowing where to begin to trouble shoot this error. The Picard script that I used is:
"java -Xmx" . $mem . "g -Djava.io.tmpdir=`pwd`/tmp -jar " . $picard . "CleanSam.jar INPUT=" . $BFile[$i] . ".bam OUTPUT= " . $BFile[$i] . "clean.bam";
Where Bfile is just the prefix from a glob list of input bam file names S1.bam....S8.bam
Any suggestions on where to start? Since I'm using the same reference genome for this as for the alignment I don't understand how it's possible to get coordinates outside the range of the reference genome length.