Question: Reference annotation file contains only transcript and CDS information, how to analyze the different expression of genes?
gravatar for zoukai3412085
2.8 years ago by
zoukai34120850 wrote:

My reference annotation gff file, which was downloaded from gigaDB(, contains only transcript ID and CDS information like this:

[root@xueduanliu Ginkgo_RNA_sequencing_analysis]# cat Ginkgo_biloba.gff | head -n 10

C24882126 Cufflinks mRNA 33 1196 . + . ID=Gb_00001;

C24882126 Cufflinks CDS 33 116 . + 0 Parent=Gb_00001;

C24882126 Cufflinks CDS 219 460 . + 0 Parent=Gb_00001;

C24882126 Cufflinks CDS 542 863 . + 1 Parent=Gb_00001;

C24882126 Cufflinks CDS 945 1196 . + 0 Parent=Gb_00001;

C24883216 EST mRNA 236 1243 . + . ID=Gb_00002;

C24883216 EST CDS 236 1243 . + . Parent=Gb_00002;

There is no exons and splice sites information in this reference annotation gff file, so how can I use to build hisat2 index and map to genome by hisat2 and stringtie?

tophat pipeline: Bowtie2 uses reference genome to build index then tophat uses reference annotation file and samples' fastq file to map

hisat2 pipeline: hisat2 uses reference genome and reference annotation file to build index then use samples' fastq file to map

My purpose is just to quantify genes in the reference annotation file, and then to analyze the different express of them, how can i do now?

rna-seq exon cds • 871 views
ADD COMMENTlink written 2.8 years ago by zoukai34120850

You could read line by line the file and duplicate all CDS feature and modify the 3th column to "exon". This newly created gff annotation will then work into the pipelines.

ADD REPLYlink written 2.8 years ago by Juke344.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 686 users visited in the last hour