During the sequencing process, what you are actually sequencing are DNA fragments. By 'fragment', I literally mean a string of nucleotides produced during your sample preparation. Usually, the fragment is much longer than your read length. However, due mostly to cost and technology, we are only able to sequence the first few nucleotides of a fragment. This is called a single-end read. For instance, you might have a sample where the average fragment is 350bp, but the data you get back is the first 100 bp of each fragment.
You might ask what happens to the other nucleotides we can't read. They remain unknown to us. For single-end reads, the size of the fragments we are sequencing also remain unknown to us. So, the only information we have for figuring out where a read came from is by the sequence that we get back. Paired-end read is another type of sequencing. In this case, whenever we sequence one end of the fragment, we sequence the other end as well. So, now for each fragment we have two reads. In the case of the sample where the average fragment length is 350bp, we would have two reads of 100 bp each which are labeled in our data as being a pair, meaning coming from the same fragment. Now, when we try to place these pair of reads in the genome, we know that we need to find an area where they both match uniquely and are relatively close to each other. This is extra information we didn't have before.
In the case of genome assembly, the extra information about which reads are close to which other reads helps tremendously. This is why it's preferred.
For sequencing in general, the biological methods for looking at the quality of the sample are extremely helpful. This includes use of bioanalyzer and qPCR. By the time the bioinformatics analyst is looking at the data, the damage is already done so to speak. After the sequencing, there are many tools I have found useful:
fastqc (gives you initial information)
bwa or bowtie 2 and samtools gives you a sense of how much of your sequence mapped uniquely and that can be a warning if it's unusually low
Also, looking at the quality of the matches, the CIGAR string in the SAM file has been useful to me
Visualizing your data in IGV gives you a lot of information on whether your data looks the way you expect
I have found that figuring out how good your sequencing results are is usually a lot of detective work and there is no fixed formula, because every sequencing sample tends to be unique. The many, many procedures used to prepare the sample often provides lots of opportunity to introduce variation. The distribution of fragments is unique. The effect of the PCR amplification is unique.
Thanks a lot. It is immensly helpful answer. Do you know any site or link where i can get further information specially regarding Ion Torrent PGM? Thanks again for explaining this to me.
Well explained. Exactly the information I wanted to know. Thank you