Question: Potential Reason For Extreme Read Counts For A Single Replicate
gravatar for alittleboy
6.8 years ago by
alittleboy210 wrote:

When I got the read count table for a typical RNA-Seq experiment (see a subset below), I found that the read counts for the 3rd replicates (B3) in group B are consistently high relative to the other replicates (B1 and B2). I was told that the data preparations should be free of error, so I am just curious what might happen for such extreme counts for a particular replicate. Thanks!

A1             A2            A3            B1         B2          B3

12626        19794         17190          3668      4782        49020

5940         9357          8143           1681      2210        23238    

5939         9355          8143           1681      2211        23238    

8318        13113         11406           2365      3102        32556
rnaseq • 1.4k views
ADD COMMENTlink modified 6.8 years ago by Chris Cole720 • written 6.8 years ago by alittleboy210

How does the correlation between the samples look like? Is B3 an outlier?

ADD REPLYlink written 6.8 years ago by Ido Tamir5.0k

@IdoTamir: actually there are only a few extreme counts in B3, which makes the corresponding genes differentially expressed (in B). The overall library sizes for the six replicates are comparable...

ADD REPLYlink written 6.8 years ago by alittleboy210

maybe its biology? cell-cycle genes, apoptosis genes ... because cells were more-or less dense than the others if its from cell culture. Maybe its preparation/PCR: length bias? gc-bias? UTR

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by Ido Tamir5.0k
gravatar for Istvan Albert
6.8 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

This is usually caused by problems during library preparation. Some may claim that the preparation "should be" error free but then everything in the world should be error free yet is not.

In practice what you see there is that you ended up with more DNA sequenced for the some samples.

ADD COMMENTlink written 6.8 years ago by Istvan Albert ♦♦ 81k
gravatar for swbarnes2
6.8 years ago by
United States
swbarnes27.0k wrote:

The data is likely fine, what likely happened is at the last step, someone misquantified the B3 library, greatly underestimating it, and put a lot more of that library on the flow cell than they should have. As long as B3 didn't steal too much real estate from the other sample in the lane, everything is fine.

ADD COMMENTlink written 6.8 years ago by swbarnes27.0k
gravatar for Chris Cole
6.8 years ago by
Chris Cole720
Chris Cole720 wrote:

Whoever told you the data is 'error free' is lying. There's no such thing as error free in quantitative science. Every measurement has an associated error.

With this data you cannot say one way or the other. If there's no evidence of systematic errors, then there's little you can say. It could be that the B3 is correct and the other two (B1 and B2) are underestimating the expression for those genes. Or, it could be natural variability in the expression of the those genes. With more replicates you would have a better picture.

Do you have the read data? I'd look at the read distributions across all your 'extreme' genes. We've seen weird PCR artifacts this way.

Alternatively, you could process the data and call the genes you think are differentially expressed with your favourite tool and then try and validate them.

ADD COMMENTlink written 6.8 years ago by Chris Cole720
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1741 users visited in the last hour