Question: Whole Transcriptome Analysis - Two Very Basic Questions For Differential Expression.
gravatar for Ngsnewbie
8.5 years ago by
Ngsnewbie360 wrote:

In Transcriptome analysis, Some genes are expressed more, thus they will generate more cDNA while some genes are expressed less (will have less cDNA). These cDNA's are fragmented and then sequenced on various NGS platforms. We estimate the abundance of transcripts by assembling these fragments (now reads) and then mapped back on to the genome (or transcripts).I have two very basic questions as they often come in my mind -

My first question is that : What are the chances that every sheared fragment get sequenced? Suppose if some fragments which corresponds to one particular gene were not sequenced in one experimental condition while they were sequenced in second experimental condition to study, then we will get false differential expression analysis.

My second question is that : A read may align to two different position on genome. This may be due to homologous region or due to duplication of genes. Some people suggest to remove these multi-mapped reads or divide each multi-mapped read to all of the positions it maps to. But that will also hamper the exact differential expression of these loci. Could you shed some light over it.

ADD COMMENTlink modified 4.9 years ago by Biostar ♦♦ 20 • written 8.5 years ago by Ngsnewbie360
gravatar for Damian Kao
8.5 years ago by
Damian Kao15k
Damian Kao15k wrote:

First Question: Not every fragment will be sequenced. You are not even sure if every fragment is in the library. Depending on how you fragmented your sample and how you size selected, your library might not even be representative of your transcript composition.

But just like with any large scale experiments you have to trust/assume every step was homogeneous and you are selecting an unbiased sample out of your RNA popoulation. You can try to reduce variables by taking out ribosomal RNA and enriching for poly-A tails.

Second Question: Duplicated regions or common domains can make your read counts falsely higher or lower depending whether you choose to discard or divide them. If the reads that contribute to these ambiguous regions makes up a large proportion of the total read count for your gene, then I would not trust it.

This is a complicated issue in my opinion. It depends on if you want to make a couple of assumptions. If you assume sequenced reads are evenly distributed along the transcript (we can't make that assumption), then theoretically, you do not need to worry if the transcript is full length or not. With that assumption in mind, you could potentially only look at unique regions of the transcript and use the mapped reads of those region for your expression level. Expression level of 10k reads evenly distributed across 100 bp is the same as expression level for 50k reads evenly distributed across 50 bp.

Pair-end sequencing can alleviate this issue.

ADD COMMENTlink written 8.5 years ago by Damian Kao15k

And enriching for polyA tails will have you miss transcripts that do not have one, such as certain ncRNAs or histone mRNAs. There's no way to avoid bias. All you can do is try to choose a bias which will be the least harmful to your results.

ADD REPLYlink written 8.5 years ago by Eric Fournier1.4k
gravatar for Casual
8.5 years ago by
Casual90 wrote:

1 ,If the transcript exp_level was not too low, which means to have enough depth, It will be hardly impossible to miss that. But there always are some extremely low abundance transcript. So,yes,we got false negative now.But in transcriptome study,I believe it's only in minimum proportion, and will not do much harm to conclusion.

2 , If you directly map reads to genome, some can be ambiguous reads.You can either discard them away or divide to position_share transcripts and modify the ratio based on the different transcript level later.Long reads can improve this situation. Or you can assemble the reads before mapping step, can be helpful too.

ADD COMMENTlink written 8.5 years ago by Casual90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1647 users visited in the last hour