Question

Arabidopsis thaliana RNA-Seq analysis: Is 68% transcript annotation acceptable/expected with Ensembl ref and new tuxedo pipeline

0

Entering edit mode

4.4 years ago

arctic ▴ 40

Dear all, I am new to the field. I have recently been using the new tuxedo pipeline (HISAT2 aligner and StringTie Assembler with "de novo" assembly) for RNA-Seq data of Arabidopsis thaliana (more details below). The pipeline in my hand has identified ~26K transcripts with ~15K being assigned a Gene Symbol from the reference gtf. I wonder if this ratio (68% of transcripts being assigned gene symbols) is within expected range? If you have experience with Arabidopsis RNA-Seq data, your input is appreciated.

Thank you for your reply beforehand.

More details on the data (if needed): - Samples: 18 - RNA Prep: SMART-Seq® v4 Ultra® Low Input RNA Kit for Sequencing (Clontech) - Library Prep: Nextera® DNA Library Prep (Illumina) - Seq: NextSeq500 sequencing - Cycles: 75Cycles(paired-end) - Sample Num: 18 - Ensemble References Used: Arabidopsis_thaliana.TAIR10.dna.toplevel.fa Arabidopsis_thaliana.TAIR10.45.gtf

new tuxedo stringtie RNA-Seq Arabidopsis Ensembl • 798 views

ADD COMMENT • link 4.4 years ago by arctic ▴ 40

score 3 · Accepted Answer · 2019-12-16

3

Entering edit mode

4.4 years ago

lieven.sterck 15k

Yes, I would say that is according to expectations (70% "known" genes is about the point we are at in arabidopsis indeed)