For RNAseq, is it mandatory to have approximately the same number of reads between the normal and cancer samples?
Entering edit mode
6.4 years ago


I have paired-end sequencing data for 15 cancer and 15 normal samples. I am doing RNA-seq analysis for these samples.

1) While performing RNAseq analysis, is it mandatory to have approximately the same number of reads between the normal and cancer samples?

2) if yes, what is the significance of it?

3) if no, what is the significance of it?

Cancer samples basic stats:

  • Min : 21,900,652

  • Max : 161,154,015

  • Average : 105,993,656

  • Stdev : 26,799,534

Normal samples basic stats:

  • Min : 87,393,757

  • Max : 121,500,267

  • Average : 101,632,800

  • Stdev : 9,609,422

RNA-Seq • 1.2k views
Entering edit mode
6.4 years ago
venks ▴ 730

It is not mandatory to have same number of reads between any two comparison sets.

As long as you have a decent coverage you are good to start with the analysis. With regards to RNAseq experiments replicates are much more important than the depth.

Please refer to this paper

Entering edit mode

Thanks, Venkateshr89 +1 for your suggestions and the paper.

1) Can you suggest any specific tool for estimating the coverage for RNAseq analysis?

2) Also, I have been provided 30 samples (15 pairs). Each pair has one cancer and one normal sample. I read somewhere in biostar regarding the replicates is important for RNAseq analysis. What is biological replicate and technical replicate?

3) Is my pair (1 cancer and 1 normal) considered as biological replicates?

Entering edit mode

You can use fastqc and also IGV browser to check the coverage.

You can further use HTseq counts and scripts to get the exon counts.

I don't know your hypothesis so I am afraid if I can answer your question.


Entering edit mode
6.3 years ago


Agreed with venkatesh that it is not mandatory to have same number of reads between normal and tumor samples.

If tumor and normal samples are different with respect to their gene expression levels then RNAseq study would generate different number of reads per transcript (proportional to transcript abundance). Similarly if library sizes are different between normal and tumor sample, then also total reads generated would be different. That is why while studying differential gene expression between any two sample, one carries out normalization. Protocols like using Tophat, cufflinks and cuffdiff normalizes any two sample by estimating FPKM values.

Coming to biological and technical replicates, having 3 replicates per sample can be good. But in your case since you do not have replicates, if depth of sequencing is good (more reads supporting a base call), then you can go ahead. Probably this paper cane be a useful resource

If you are looking for any tool that automates differential gene expressions (DGE) across several sample, you can try SanGeniX, our recently launched tool which along with RNASeq, supports other NGS data analysis. Its free to use and rich in interactive and graphically enhanced visualizations.

You can study DGE of tumor and normal samples in pairwise or batch mode. Even group-wise DGE comparison can also be studied.


Persistent LABS


Login before adding your answer.

Traffic: 2047 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6