Question: How to recover 5'UTR, CDS, start codon after GTF merging ?
0
gravatar for nlehmann
10 months ago by
nlehmann110
France
nlehmann110 wrote:

Hi all,

I built a new annotation file out of long reads data with StringTie. When I opened the resulting GTF file, I noticed that all features except exons and transcripts disappeared. So we lose all the data on 5' or 3'UTR, CDS, start and stop codons.

I wonder if merging them would be a good idea to recover this type of data (at least for the genes that have not been modified by StringTie). Do you know of any tool that could do that ? I tried to merge them with cuffmerge and gffcompare. None of the two give the results that I would expect (a merged file with data on exons, CDS, UTR...).

Here is a sample of the reference file I used (where there was data on UTR, CDS...):

> cat ref_olig2.gtf
chr1    ncbiRefSeq  transcript  106522741   106524545   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1";  gene_name "OLIG2";
chr1    ncbiRefSeq  exon    106522741   106522781   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "1"; exon_id "NM_001031526.1.1"; gene_name "OLIG2";
chr1    ncbiRefSeq  5UTR    106522741   106522781   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "1"; exon_id "NM_001031526.1.1"; gene_name "OLIG2";
chr1    ncbiRefSeq  exon    106523018   106524545   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; exon_id "NM_001031526.1.2"; gene_name "OLIG2";
chr1    ncbiRefSeq  5UTR    106523018   106523036   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; exon_id "NM_001031526.1.2"; gene_name "OLIG2";
chr1    ncbiRefSeq  CDS 106523037   106523930   .   +   0   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; exon_id "NM_001031526.1.2"; gene_name "OLIG2";
chr1    ncbiRefSeq  3UTR    106523934   106524545   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; exon_id "NM_001031526.1.2"; gene_name "OLIG2";
chr1    ncbiRefSeq  start_codon 106523037   106523039   .   +   0   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; exon_id "NM_001031526.1.2"; gene_name "OLIG2";
chr1    ncbiRefSeq  stop_codon  106523931   106523933   .   +   0   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; exon_id "NM_001031526.1.2"; gene_name "OLIG2";

Here is the same region in the new GTF (to make it simple, I chose a region that has not been modified by StringTie):

> cat stringtie_olig2.gtf
chr1    ncbiRefSeq  transcript  106522741   106524545   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; gene_name "OLIG2"; ref_gene_id "OLIG2";
chr1    ncbiRefSeq  exon    106522741   106522781   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "1"; gene_name "OLIG2";
chr1    ncbiRefSeq  exon    106523018   106524545   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; gene_name "OLIG2";

Result of gffcompare (only transcript and exons):

> gffcompare stringtie_olig2.gtf ref_olig2.gtf
> cat gffcmp.combined.gtf
chr1    ncbiRefSeq  transcript  106522741   106524545   .   +   .   transcript_id "TCONS_00000001"; gene_id "XLOC_000001"; gene_name "OLIG2"; oId "NM_001031526.1"; tss_id "TSS1";
chr1    ncbiRefSeq  exon    106522741   106522781   .   +   .   transcript_id "TCONS_00000001"; gene_id "XLOC_000001"; exon_number "1";
chr1    ncbiRefSeq  exon    106523018   106524545   .   +   .   transcript_id "TCONS_00000001"; gene_id "XLOC_000001"; exon_number "2";

Result of cuffmerge (only exons):

> cuffmerge -g ref.olig2.gtf list_cuffmerge.txt
> cat list_cuffmerge.txt
stringtie_olig2.gtf 
> cat merged_asm/merged.gtf
    chr1    Cufflinks   exon    106522741   106522781   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "OLIG2"; oId "NM_001031526.1"; nearest_ref "NM_001031526.1"; class_code "="; tss_id "TSS1";
    chr1    Cufflinks   exon    106523018   106524545   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; gene_name "OLIG2"; oId "NM_001031526.1"; nearest_ref "NM_001031526.1"; class_code "="; tss_id "TSS1";
ADD COMMENTlink modified 10 months ago • written 10 months ago by nlehmann110
0
gravatar for Juke34
10 months ago by
Juke345.0k
Sweden
Juke345.0k wrote:

You can give a try with ‘agat_sp_merge_annotations.pl’ from AGAT

ADD COMMENTlink written 10 months ago by Juke345.0k

Thanks a lot, it's working fine with AGAT tool ! Sorry for the delay in replying.

ADD REPLYlink written 10 months ago by nlehmann110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour
_