Question: Stringtie merge assembly with extra transcripts
1
gravatar for waqaskhokhar999
4 months ago by
waqaskhokhar99970 wrote:

I have merged transcriptome assembly of many accessions using stringtie merge command. Stringtie assigned "MSTRG.1" id to gene AT1G01010 which normally have 1 transcript (http://plants.ensembl.org/Arabidopsis_thaliana/Gene/Summary?db=core;g=AT1G01010;r=1:3600-6000;t=AT1G01010.1), however here I can see 2 more transcriprts, does that means these two trancripts are novel transcripts from known gene?

1   StringTie   transcript  3631    5899    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   exon    3631    3913    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   exon    3996    4276    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "2"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   exon    4486    4605    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "3"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   exon    4706    5095    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "4"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   exon    5174    5326    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "5"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   exon    5439    5899    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "6"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   transcript  3651    5899    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; 

1   StringTie   exon    3651    3913    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "1"; 

1   StringTie   exon    3996    4276    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "2"; 

1   StringTie   exon    4506    4605    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "3"; 

1   StringTie   exon    4706    5095    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "4"; 

1   StringTie   exon    5174    5326    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "5"; 

1   StringTie   exon    5439    5899    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "6"; 

1   StringTie   transcript  3657    5899    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; 
1   StringTie   exon    3657    3913    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "1"; 

1   StringTie   exon    3996    4276    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "2"; 

1   StringTie   exon    4486    5095    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "3"; 

1   StringTie   exon    5174    5326    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "4"; 

1   StringTie   exon    5439    5899    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "5";
ADD COMMENTlink modified 4 months ago by husensofteng120 • written 4 months ago by waqaskhokhar99970
1
gravatar for husensofteng
4 months ago by
husensofteng120
Sweden
husensofteng120 wrote:

The two extra ones are potentially novel transcripts according to StringTie. However, StringTie tries to assemble all reads in each loci: when a known transcript can explain the reads it just reports the known transcript. Whereas in cases of remaining reads, it tries to assemble other transcript forms and reports them as potentially novel.

Therefore, you need to investigate the results properly. I suggest that you load the bam file and the generated GTF into the IGV browser and visualize the read counts for each of the transcripts.

In order to limit the results to the canonical transcripts given in your ref_ann.gff, run it with -e option which forces StringTie to ignore generating novel transcripts.

ADD COMMENTlink written 4 months ago by husensofteng120

Many thanks for the explanation, can we also investigate novel transcripts based on their TPM/FPKM value across samples that generated as a result of the second assembly instead of loading bam and gtf file in igv?

ADD REPLYlink written 4 months ago by waqaskhokhar99970

yes sure, but remember the expression values are for the whole transcript. Also, stringTie uses a maximum flow algorithm to use all sequencing reads therefore the alternative transcripts are not necessarily real ones and hence further validation is needed. May be you also want to consider exon-centric analysis (read on featureCounts -f exons, and DEXSeq) if your goal is to further investigate alternative transcripts.

ADD REPLYlink written 4 months ago by husensofteng120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1165 users visited in the last hour