Question

Potential Reason For Extreme Read Counts For A Single Replicate

0

Entering edit mode

11.2 years ago

alittleboy ▴ 220

When I got the read count table for a typical RNA-Seq experiment (see a subset below), I found that the read counts for the 3rd replicates (B3) in group B are consistently high relative to the other replicates (B1 and B2). I was told that the data preparations should be free of error, so I am just curious what might happen for such extreme counts for a particular replicate. Thanks!

A1             A2            A3            B1         B2          B3

12626        19794         17190          3668      4782        49020

5940         9357          8143           1681      2210        23238    

5939         9355          8143           1681      2211        23238    

8318        13113         11406           2365      3102        32556

rnaseq • 2.1k views

ADD COMMENT • link updated 11.1 years ago by Chris Cole ▴ 800 • written 11.2 years ago by alittleboy ▴ 220

0

Entering edit mode

How does the correlation between the samples look like? Is B3 an outlier?

ADD REPLY • link 11.1 years ago by Ido Tamir 5.2k

0

Entering edit mode

@IdoTamir: actually there are only a few extreme counts in B3, which makes the corresponding genes differentially expressed (in B). The overall library sizes for the six replicates are comparable...

ADD REPLY • link 11.1 years ago by alittleboy ▴ 220

0

Entering edit mode

maybe its biology? cell-cycle genes, apoptosis genes ... because cells were more-or less dense than the others if its from cell culture. Maybe its preparation/PCR: length bias? gc-bias? UTR

ADD REPLY • link 11.1 years ago by Ido Tamir 5.2k

score 0 · Answer 1 · 2013-03-02

0

Entering edit mode

11.2 years ago

Istvan Albert 100k

This is usually caused by problems during library preparation. Some may claim that the preparation "should be" error free but then everything in the world should be error free yet is not.

In practice what you see there is that you ended up with more DNA sequenced for the some samples.

ADD COMMENT • link 11.2 years ago by Istvan Albert 100k

score 0 · Answer 2 · 2013-03-03

The data is likely fine, what likely happened is at the last step, someone misquantified the B3 library, greatly underestimating it, and put a lot more of that library on the flow cell than they should have. As long as B3 didn't steal too much real estate from the other sample in the lane, everything is fine.

score 0 · Answer 3 · 2013-03-05

Whoever told you the data is 'error free' is lying. There's no such thing as error free in quantitative science. Every measurement has an associated error.

With this data you cannot say one way or the other. If there's no evidence of systematic errors, then there's little you can say. It could be that the B3 is correct and the other two (B1 and B2) are underestimating the expression for those genes. Or, it could be natural variability in the expression of the those genes. With more replicates you would have a better picture.

Do you have the read data? I'd look at the read distributions across all your 'extreme' genes. We've seen weird PCR artifacts this way.

Alternatively, you could process the data and call the genes you think are differentially expressed with your favourite tool and then try and validate them.