Question

RNA-seq analysis using RUVSeq: am I doing it right?

1

Entering edit mode

5.9 years ago

Ankur B. Sharma ▴ 20

Hi guys, I am trying to learn and analyze my RNA-seq data. I have no prior experience in r or python but with the help of online materials, I did manage to learn the basics ( I hope so). Cutting to the chase, I am analyzing RNA-seq data from two samples: 2MO (replicate 1 and replicate 2) and 8MO (replicate1). I have got the reads counts (please click the link below) using this pipeline: Fastq-->Trimgalore->STARaligner-->featureCounts.

Both replicates of 2MO have 5 samples: 2N,4N,8N, TH, and LT while the only replicate of 8MO have also these 5 samples: 2N,4N,8N,TH, and LT

I am trying to find the differentially expressed genes (DEGs) between each sample of 2MO and 8MO. For example DEGs between 2N of 2MO rep1 and 2N of 8MO rep1 etc. My scripts are as below: https://ibb.co/vqPR3VS

I don't see many DEGs here. Also, I don't understand whether positive LOGFC (log fold change) values belongs to 2MO_R1 or 8MO_R1 in my last part of the analysis: 2N_R1_2MO_vs_2N_R1_8MO (DEG calculation)

Looking forward to your help and suggestions regards, Ankur

edgeR RuvSeq DEGs RNA-Seq • 3.1k views

ADD COMMENT • link updated 5.9 years ago by Charles Warden 8.3k • written 5.9 years ago by Ankur B. Sharma ▴ 20

2

Entering edit mode

How to add images to a Biostars post

ADD REPLY • link 5.9 years ago by GenoMax 152k

1

Entering edit mode

Thank you so much @genomax

ADD REPLY • link 5.9 years ago by Ankur B. Sharma ▴ 20

2

Entering edit mode

Posting screen shots of command lines/codes is not a good idea. You should copy and paste your code in (and then use the 101010 button to format the code properly after highlighting it).

code_formatting

ADD REPLY • link 5.9 years ago by GenoMax 152k

score 0 · Answer 1 · 2019-08-27

I think that question can be posed in two slightly different ways:

1) Did you write code without bugs and/or misunderstanding of the functions, so that your code matches your intention?

2) Did the RUVseq normalization "work"?

I think you are mostly asking about question #1. For that I'm not 100% certain, and it may be best for someone more familiar with the RUVseq package to answer.

That said, I also apologize that I'm not 100% clear what sources of bias you are trying to correct, and that may be important. For example, there isn't just 1 way to calculate p-values in edgeR (and the right commands may depend upon your goals and experimental design - you may may just simply need to test different strategies for each project, which is kind of what you have to do with the p-value calculation).

For question #2, I would strongly recommending having an independently calculated expression value to see the effect before and after alternative normalization (no matter what you use). For example, you probably can't perfectly correct all bias, but you need to have some way to get a sense of whether your net impact was positive (or whether you had more severe problems with over-correction and/or over-fitting).

For example, if you do have a known technical variable that is randomized across your biological variables (such as batch, library preparation method, etc.), I would probably compare using RUVseq to including a 2nd variable in your differentially expression model (and perhaps centering among each technical group when visualizing relative expression in a heatmap).