Question

RNAseq read depth

2

Entering edit mode

7.3 years ago

bakerjh10 ▴ 20

I am working on a project with collaborators in which one of our aim is to examine differential gene expression in humans across two time points. So is gene expression different at time 1 and time 2. We will be using RNA-sequencing and are getting some conflicting opinions from our "experts" as far as read depth.

Expert 1 says 2x50bp and everything they do with RNA is 2x50. And only use 2x100 if you are interested in increasing sensitivity for detecting snps and indels

Our sequencing core says 2x50bp is what "all their investigators using RNAseq use."

Expert 2 says 2x100bp and that essentially 2x50 is "old school"

We aren't sure which direction to go (or who is right!). We are not necessarily interested in detecting snps or indels, but our interest is purely identifying gene expression changes. Depending on which expert recommendation we go with drastically changes our budget. 2x100 doubles the cost of sequencing. So if we increase the costs we had planned for sequencing, that is a lot less samples/replicates we can sequence. We actually wouldn't be able to afford to sequence all of our collected samples, which seems like a waste! Our preference would be to stick with what we originally planned and budgeted for and 2/3 of our experts are telling us 2x50.

RNA-Seq sequencing next-gen rna-seq • 3.6k views

ADD COMMENT • link updated 5.3 years ago by Biostar 20 • written 7.3 years ago by bakerjh10 ▴ 20

score 1 · Answer 1 · 2017-01-12

1

Entering edit mode

7.3 years ago

WouterDeCoster 47k

I couldn't find this tweet a month ago but it's valuable so I would still like to share:

. @vallens @lpachter yes so here are stats from simulated data. 75PE looks good choice/compromise pic.twitter.com/YwICWBreQJ
— Mick Watson (@BioMickWatson) January 12, 2017

Take home message: longer reads and paired-end reads increase fraction of uniquely mapping reads and are important for splicing.

ADD COMMENT • link 7.3 years ago by WouterDeCoster 47k

1

Entering edit mode

Agreed, but this comparison is valid only if cost is not a consideration. You can sequence three replicates at SE-50bp for the same price as one PE-75bp sample, and those replicates provide far more statistical power for detecting differential gene expression than a small increase in the percentage of unique mappers.

P.S.-Thanks for posting the tweet: I'd been looking for it without success.

ADD REPLY • link 7.3 years ago by harold.smith.tarheel ★ 4.9k

0

Entering edit mode

For HiSeq, PE100 should only be about twice the cost of SE50. If you are getting charged 3X for PE75 over SE50, you should ask about that.

Also, that's just for sequencing. The libraries are going to be the same price and those are a significant portion of the total cost.

You can definitely afford more replicates with SE50, but you need to sequence deep to get to even 2X price difference.

ADD REPLY • link 7.3 years ago by igor 13k

0

Entering edit mode

You're correct, 3X is true only if you're talking about a single sample (three replicates of SE-50 @ 50M reads/library vs. one replicate of PE-75). But SE-50bp IS cheaper than PE-75, and the larger point that more replicates is much better than longer reads remains valid.

As for library prep, the cost is negligible (~$50 each) compared to the cost of sequencing.

ADD REPLY • link 7.3 years ago by harold.smith.tarheel ★ 4.9k

score 0 · Answer 2 · 2016-12-14

0

Entering edit mode

7.3 years ago

Michele Busby ★ 2.2k

2x50

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531809/

ADD COMMENT • link 7.3 years ago by Michele Busby ★ 2.2k

0

Entering edit mode

Actually, that reference recommends SE-50bp:

We found that, with the exception of 25 bp reads, there is little difference for the detection of differential expression regardless of the read length. Once single-end reads are at a length of 50 bp, the results do not change substantially for any level up to, and including, 100 bp paired-end.

ADD REPLY • link 7.3 years ago by harold.smith.tarheel ★ 4.9k

1

Entering edit mode

That wasn't a choice...

ADD REPLY • link 7.3 years ago by Michele Busby ★ 2.2k

score 0 · Answer 3 · 2016-12-14

0

Entering edit mode

7.3 years ago

Sinji ★ 3.2k

http://rnaseq.uoregon.edu/

Section 1.2.

Long-reads > 80bp are only required for de Novo transcript assembly.

ADD COMMENT • link 7.3 years ago by Sinji ★ 3.2k

score 0 · Answer 4 · 2016-12-15

0

Entering edit mode

7.3 years ago

Vitis ★ 2.5k

If you have a solid reference genome and not interested in SNPs, new sequence or new assemblies, you should do 2X50bp.

ADD COMMENT • link 7.3 years ago by Vitis ★ 2.5k

score 0 · Answer 5 · 2017-01-12

We have tested 2x50, 2x100, 1x50, and 1x100 in mouse, fly, and yeast for impact in mut-vs-WT DE experiments. There was essentially no difference in any of the results, thus we only recommend 1x50 for gene expression experiments, no need to spend more money. Of course, your organism may vary.

If mappability is the chief concern, a quick proxy for mappability is to count kmer uniqueness in the transcriptome for various K. For instance, for Ensembl 84 mouse transcriptome, 50-mers are already 53.7% unique, and 100-mers are 55.8% unique. So doubling your read length (and sequencing cost) will only get you 2% more mappability. For fly and yeast (Ens 84), going from 50bp to 100bp gives you less than 1% gain in mappability.

Doing paired ends is much better for raising mappability, although I don't have numbers offhand. The takehome being, if you want better mappability, do not do 1x100 when you could do 2x50,

Also, if you are concerned with splicing or transcript structure, then paired-ends is crucial. Spanning that extra space provides big gains in assembly and splicing detection. I do not know as 2x50 or 2x100 maked much difference for alt-splicing detection, but it certainly does for assembly -- at least for large, complex transcriptomes like vertebrates.