RNA-seq Experiment - Multiple Gene names with "." - How can i deal with these
10 months ago
mfawzy.sami ▴ 80

Hello Everyone,

I have performed RNA-SEQ Data analysis using "HiSat2" -> StringTie -> Ballgown.... I used Ensemble Homo Sapiens Reference genome. The experiment contained Technical replicas and it was paired end.

I have done this analysis twice using two different pipelines, an old pipeline which contains Tophat2,Cufflinks,cuffmerge,Cuffdiff.... etc and a newer pipeline which is named "New Tuxedo package" as explained above.

In both cases, I had the DE list of genes, the problem is , some genes have no proper gene names, they have "." DOT in place of the name. My question is , How can i deal with these genes. What are they ?, i have many pseudogenes in the list and they have proper names, but some genes have no names at all only ".". How do you deal with this ?

Thank you

I'm not sure how you expect us to help. It seems identifiers for your genes were either missing or not carried forward at some step. Are there any other transcript/gene IDs present in your output? If so, you can munge your output so that those IDs are used and carried forwards in your analysis. There are some genes that do not have official gene symbols, so if that's what you're trying to use, it may be missing for some of them (mostly lncRNAs, pseudogenes, etc).

If you are working with a genome for which sequence/annotation is available then there is no pressing reason to use StringTie/CuffLinks. Why not use an established pipeline such as STAR/DESeq2 with a clean set of annotations.


