Entering edit mode
2.4 years ago
Tian
•
0
Hello all,
I am a beginner for bioinformatics and I have 2 questions about RNAseq data processing for tomato.
- I am always confused about the DESeq's normalization function for gene length. I have 2 data sets at hand, one is single-end RNAseq and the other is 3' RNAseq. I think there are fundamental differences between them. For single-end RNAseq, more reads does not necessarily mean more transcripts because longer gene tends to have more reads mapping to it. While for 3' RNAseq, each read is a transcript, it has nothing to do with gene length. Does DESeq package normalize for gene length (Based on my googling, it should not). If the answer is No, what further step should I take when dealing with single-end RNAseq data? Also, I want to double check that I need vst() for both of them, am I correct?
- I was hoping to analyze Gene Ontology. My geneID is in the form of tomato_tx2gene and looks like Solyc..g... (e.g. Solyc00g160260), I was hoping to know how to convert it to a form that enrichGO can accept as keyType.
Any thoughts or suggestions would be greatly appreciated!
Thank you.
Tian
Can we go a step back and clarify what these datasets are compared to each other? What is the analysis goal? No, DESeq2 does not normalize for length. Do you plan to combine these datasets? If not then you don't have to worry about length anyway as a within-dataset analysis does not care about length.
Thank you very much, I appreciate it! I do not plan to combine these two datasets. One reason I asked this question was because there might be some downstream processing that requires data normalized by gene length.