In Transcriptome analysis, Some genes are expressed more, thus they will generate more cDNA while some genes are expressed less (will have less cDNA). These cDNA's are fragmented and then sequenced on various NGS platforms. We estimate the abundance of transcripts by assembling these fragments (now reads) and then mapped back on to the genome (or transcripts).I have two very basic questions as they often come in my mind -
My first question is that : What are the chances that every sheared fragment get sequenced? Suppose if some fragments which corresponds to one particular gene were not sequenced in one experimental condition while they were sequenced in second experimental condition to study, then we will get false differential expression analysis.
My second question is that : A read may align to two different position on genome. This may be due to homologous region or due to duplication of genes. Some people suggest to remove these multi-mapped reads or divide each multi-mapped read to all of the positions it maps to. But that will also hamper the exact differential expression of these loci. Could you shed some light over it.