Again on the issue of marking duplicates...
If you have single end reads tools like Picard MarkDuplicates mark as duplicates reads sharing the same 5'end coordinates and ignore the 3' coordinate (see here). Is this the right thing to do?
I can see the point in looking at only 5' coordinates when the read length was quite a bit shorter then the fragment length (say 35-70 bp) and sequence quality dropped after ~50bp.
But now you can get reads up to 300bp, which is well into or above the range of the fragment length. So reads sharing the same 5'end but different 3'end might well be genuine independent fragments and MarkDuplicates would get them wrong. Am I missing something? Does anybody know of a tool that looks at both 5' and 3' ends for SE reads?