Stringtie output gtf file only contains STRG,no original annotations
1
0
Entering edit mode
4.4 years ago
yiren ▴ 10

Hi,

I am trying to generate novel genes transcripts expression following the stringtie manual. The stringtie --merge mode takes as input a list of all the assembled transcripts files (in GTF format) previously obtained for each sample, as well as a reference annotation file (-G option).The merged.gtf file geneid is STRG flag,but not the reference geneid,such as below.

chr1A_  StringTie   transcript  4059    4397    1000    +   .   gene_id "MSTRG.3"; transcript_id "C1_00010W_A-T"; ref_gene_id "C1_00010W_A"; 
chr1A   StringTie   exon    4059    4397    1000    +   .   gene_id "MSTRG.3"; transcript_id "C1_00010W_A-T"; exon_number "1"; ref_gene_id "C1_00010W_A"; 
chr1A   StringTie   transcript  4409    5266    1000    -   .   gene_id "MSTRG.4"; transcript_id "MSTRG.4.1"; 
chr1A   StringTie   exon    4409    4527    1000    -   .   gene_id "MSTRG.4"; transcript_id "MSTRG.4.1"; exon_number "1"; 
chr1A   StringTie   exon    4556    5266    1000    -   .   gene_id "MSTRG.4"; transcript_id "MSTRG.4.1"; exon_number "2"; 
chr1A   StringTie   transcript  4409    4720    1000    -   .   gene_id "MSTRG.4"; transcript_id "C1_00020C_A-T"; ref_gene_id "C1_00020C_A"; 
chr1A   exon    4409    4720    1000    -   .   gene_id "MSTRG.4"; transcript_id "C1_00020C_A-T"; exon_number "1"; ref_gene_id "C1_00020C_A";

when I use Ballgown for differential expression,I get the the file ballgown.gtf ,the gene_id is also STRG flag not ref_gene_id.the file is below

chr1    StringTie   transcript  126521  126612  .   +   .   gene_id "MSTRG.71"; transcript_id "C1_00720W_A-T"; cov "0.0"; FPKM "0.000000"; TPM "0.000000";
chr1    StringTie   exon    126521  126556  .   +   .   gene_id "MSTRG.71"; transcript_id "C1_00720W_A-T"; exon_number "1"; cov "0.0";
chr1    StringTie   exon    126577  126612  .   +   .   gene_id "MSTRG.71"; transcript_id "C1_00720W_A-T"; exon_number "2"; cov "0.0";
chr1    StringTie   transcript  204985  205057  .   +   .   gene_id "MSTRG.103"; transcript_id "C1_01020W_A-T"; cov "0.0"; FPKM "0.000000"; TPM "0.000000";
chr1    StringTie   exon    204985  205057  .   +   .   gene_id "MSTRG.103"; transcript_id "C1_01020W_A-T"; exon_number "1"; cov "0.0";
chr1    CGD transcript  296046  296504  .   +   .   gene_id "C1_01500W_A"; transcript_id "C1_01500W_A-T"; cov "0.0"; FPKM "0.000000"; TPM "0.000000";
chr1    CGD exon    296046  296504  .   +   .   gene_id "C1_01500W_A"; transcript_id "C1_01500W_A-T"; exon_number "1"; cov "0.0";

how can I get the file that geneid is ref_gene_id? help me for this thank you very much !!!

rna-seq • 2.3k views
ADD COMMENT
1
Entering edit mode
4.4 years ago
MatthewP ★ 1.4k

Hey, I recommend you check your input gtf file and annotation gtf/gff file, maybe your input gtf chromosome name is like chr1 style but your annotation file is 1 style.

ADD COMMENT
1
Entering edit mode

thank you for you reply.input gtf file is same as annotation file .Some genes geneid are ref_gene_id,but almost genes geneid are STRG flag.

ADD REPLY

Login before adding your answer.

Traffic: 1532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6