Hm. Old questions, so nobody will read this, but I'm not entirely happy. Here are my answers:
- Paired ends is supported by some technologies (Illumina and Sanger), where it is possible to sequence from both ends of a clone. Mate pairs involves making circular fragments using a linker sequence, and fragmenting them around the linker, and then sequencing the result. Illumina will read from each end of the fragment, 454 (and I believe Solid?) will read through it all.
Now this terminology isn't fixed, and lots of people will talk about mate pairs when doing paired end, or vice versa. Caution is advised! Also, I think Solid has some funky variations on this, but I haven't looked too closely. And, the mate pair protocol is rather unreliable, IME. Expect lots of non-mate-pair reads, and a wide range of insert sizes.
Oh, and I believe you can get mate pairs from 1.5K to 20K. We just got Illumina PE reads at 500bp inserts, this was considered experimental by the company doing it.
Removing duplicates can refer to several different things, but with all the second gen technologies, it is common to get a proportion of duplicate clones. This can skew things, e.g. for de novo assembly, it will give an artificially high coverage of a region, and it might be incorrectly identified as a repeat.
The random placement of sampe is probably to make sure that you get the right coverage for repetitive regions. So it's orthogonal to removing duplicates that are sequencing artifacts. And if you have a dinucleotide region of (AT), a read of (AT) and a read of (TA)* would not be considered duplicates, but be placed randomly.
Can't help you there, but Pierre seems to have it covered. :-)