Question: How to recover 5'UTR, CDS, start codon after GTF merging ?
0
gravatar for nlehmann
12 weeks ago by
nlehmann70
France
nlehmann70 wrote:

Hi all,

I built a new annotation file out of long reads data with StringTie. When I opened the resulting GTF file, I noticed that all features except exons and transcripts disappeared. So we lose all the data on 5' or 3'UTR, CDS, start and stop codons.

I wonder if merging them would be a good idea to recover this type of data (at least for the genes that have not been modified by StringTie). Do you know of any tool that could do that ? I tried to merge them with cuffmerge and gffcompare. None of the two give the results that I would expect (a merged file with data on exons, CDS, UTR...).

Here is a sample of the reference file I used (where there was data on UTR, CDS...):

> cat ref_olig2.gtf
chr1    ncbiRefSeq  transcript  106522741   106524545   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1";  gene_name "OLIG2";
chr1    ncbiRefSeq  exon    106522741   106522781   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "1"; exon_id "NM_001031526.1.1"; gene_name "OLIG2";
chr1    ncbiRefSeq  5UTR    106522741   106522781   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "1"; exon_id "NM_001031526.1.1"; gene_name "OLIG2";
chr1    ncbiRefSeq  exon    106523018   106524545   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; exon_id "NM_001031526.1.2"; gene_name "OLIG2";
chr1    ncbiRefSeq  5UTR    106523018   106523036   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; exon_id "NM_001031526.1.2"; gene_name "OLIG2";
chr1    ncbiRefSeq  CDS 106523037   106523930   .   +   0   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; exon_id "NM_001031526.1.2"; gene_name "OLIG2";
chr1    ncbiRefSeq  3UTR    106523934   106524545   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; exon_id "NM_001031526.1.2"; gene_name "OLIG2";
chr1    ncbiRefSeq  start_codon 106523037   106523039   .   +   0   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; exon_id "NM_001031526.1.2"; gene_name "OLIG2";
chr1    ncbiRefSeq  stop_codon  106523931   106523933   .   +   0   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; exon_id "NM_001031526.1.2"; gene_name "OLIG2";

Here is the same region in the new GTF (to make it simple, I chose a region that has not been modified by StringTie):

> cat stringtie_olig2.gtf
chr1    ncbiRefSeq  transcript  106522741   106524545   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; gene_name "OLIG2"; ref_gene_id "OLIG2";
chr1    ncbiRefSeq  exon    106522741   106522781   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "1"; gene_name "OLIG2";
chr1    ncbiRefSeq  exon    106523018   106524545   .   +   .   gene_id "OLIG2"; transcript_id "NM_001031526.1"; exon_number "2"; gene_name "OLIG2";

Result of gffcompare (only transcript and exons):

> gffcompare stringtie_olig2.gtf ref_olig2.gtf
> cat gffcmp.combined.gtf
chr1    ncbiRefSeq  transcript  106522741   106524545   .   +   .   transcript_id "TCONS_00000001"; gene_id "XLOC_000001"; gene_name "OLIG2"; oId "NM_001031526.1"; tss_id "TSS1";
chr1    ncbiRefSeq  exon    106522741   106522781   .   +   .   transcript_id "TCONS_00000001"; gene_id "XLOC_000001"; exon_number "1";
chr1    ncbiRefSeq  exon    106523018   106524545   .   +   .   transcript_id "TCONS_00000001"; gene_id "XLOC_000001"; exon_number "2";

Result of cuffmerge (only exons):

> cuffmerge -g ref.olig2.gtf list_cuffmerge.txt
> cat list_cuffmerge.txt
stringtie_olig2.gtf 
> cat merged_asm/merged.gtf
    chr1    Cufflinks   exon    106522741   106522781   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "OLIG2"; oId "NM_001031526.1"; nearest_ref "NM_001031526.1"; class_code "="; tss_id "TSS1";
    chr1    Cufflinks   exon    106523018   106524545   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; gene_name "OLIG2"; oId "NM_001031526.1"; nearest_ref "NM_001031526.1"; class_code "="; tss_id "TSS1";
ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by nlehmann70
0
gravatar for Juke34
12 weeks ago by
Juke344.1k
Sweden
Juke344.1k wrote:

You can give a try with ‘agat_sp_merge_annotations.pl’ from AGAT

ADD COMMENTlink written 12 weeks ago by Juke344.1k

Thanks a lot, it's working fine with AGAT tool ! Sorry for the delay in replying.

ADD REPLYlink written 10 weeks ago by nlehmann70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1817 users visited in the last hour