Question

For RNAseq, is it mandatory to have approximately the same number of reads between the normal and cancer samples?

0

Entering edit mode

7.6 years ago

bioinforesearchquestions ▴ 370

Hi,

I have paired-end sequencing data for 15 cancer and 15 normal samples. I am doing RNA-seq analysis for these samples.

1) While performing RNAseq analysis, is it mandatory to have approximately the same number of reads between the normal and cancer samples?

2) if yes, what is the significance of it?

3) if no, what is the significance of it?

Cancer samples basic stats:

Min : 21,900,652
Max : 161,154,015
Average : 105,993,656
Stdev : 26,799,534

Normal samples basic stats:

Min : 87,393,757
Max : 121,500,267
Average : 101,632,800
Stdev : 9,609,422

RNA-Seq • 1.4k views

ADD COMMENT • link updated 7.5 years ago by Persistent LABS ▴ 750 • written 7.6 years ago by bioinforesearchquestions ▴ 370

score 1 · Answer 1 · 2016-09-22

1

Entering edit mode

7.6 years ago

venks ▴ 740

It is not mandatory to have same number of reads between any two comparison sets.

As long as you have a decent coverage you are good to start with the analysis. With regards to RNAseq experiments replicates are much more important than the depth.

Please refer to this paper http://www.nature.com/nbt/journal/v29/n7/full/nbt.1910.html

ADD COMMENT • link 7.6 years ago by venks ▴ 740

0

Entering edit mode

Thanks, Venkateshr89 +1 for your suggestions and the paper.

1) Can you suggest any specific tool for estimating the coverage for RNAseq analysis?

2) Also, I have been provided 30 samples (15 pairs). Each pair has one cancer and one normal sample. I read somewhere in biostar regarding the replicates is important for RNAseq analysis. What is biological replicate and technical replicate?

3) Is my pair (1 cancer and 1 normal) considered as biological replicates?

ADD REPLY • link 7.6 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

You can use fastqc and also IGV browser to check the coverage.

You can further use HTseq counts and dexseq_counts.py scripts to get the exon counts.

I don't know your hypothesis so I am afraid if I can answer your question.

Thanks

ADD REPLY • link 7.6 years ago by venks ▴ 740

score 1 · Answer 2 · 2016-10-05

Hi

Agreed with venkatesh that it is not mandatory to have same number of reads between normal and tumor samples.

If tumor and normal samples are different with respect to their gene expression levels then RNAseq study would generate different number of reads per transcript (proportional to transcript abundance). Similarly if library sizes are different between normal and tumor sample, then also total reads generated would be different. That is why while studying differential gene expression between any two sample, one carries out normalization. Protocols like using Tophat, cufflinks and cuffdiff normalizes any two sample by estimating FPKM values.

Coming to biological and technical replicates, having 3 replicates per sample can be good. But in your case since you do not have replicates, if depth of sequencing is good (more reads supporting a base call), then you can go ahead. Probably this paper cane be a useful resource https://www.ncbi.nlm.nih.gov/pubmed/27022035.

If you are looking for any tool that automates differential gene expressions (DGE) across several sample, you can try SanGeniX, our recently launched tool which along with RNASeq, supports other NGS data analysis. Its free to use and rich in interactive and graphically enhanced visualizations.

You can study DGE of tumor and normal samples in pairwise or batch mode. Even group-wise DGE comparison can also be studied.

Thanks

Persistent LABS