Dear all, I am new to the field. I have recently been using the new tuxedo pipeline (HISAT2 aligner and StringTie Assembler with "de novo" assembly) for RNA-Seq data of Arabidopsis thaliana (more details below). The pipeline in my hand has identified ~26K transcripts with ~15K being assigned a Gene Symbol from the reference gtf. I wonder if this ratio (68% of transcripts being assigned gene symbols) is within expected range? If you have experience with Arabidopsis RNA-Seq data, your input is appreciated.
Thank you for your reply beforehand.
More details on the data (if needed): - Samples: 18 - RNA Prep: SMART-Seq® v4 Ultra® Low Input RNA Kit for Sequencing (Clontech) - Library Prep: Nextera® DNA Library Prep (Illumina) - Seq: NextSeq500 sequencing - Cycles: 75Cycles(paired-end) - Sample Num: 18 - Ensemble References Used: Arabidopsis_thaliana.TAIR10.dna.toplevel.fa Arabidopsis_thaliana.TAIR10.45.gtf