Stringtie merge assembly with extra transcripts
1
1
Entering edit mode
23 months ago

I have merged transcriptome assembly of many accessions using stringtie merge command. Stringtie assigned "MSTRG.1" id to gene AT1G01010 which normally have 1 transcript (http://plants.ensembl.org/Arabidopsis_thaliana/Gene/Summary?db=core;g=AT1G01010;r=1:3600-6000;t=AT1G01010.1), however here I can see 2 more transcriprts, does that means these two trancripts are novel transcripts from known gene?

1   StringTie   transcript  3631    5899    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   exon    3631    3913    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   exon    3996    4276    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "2"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   exon    4486    4605    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "3"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   exon    4706    5095    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "4"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   exon    5174    5326    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "5"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   exon    5439    5899    1000    +   .   gene_id "MSTRG.1"; transcript_id "AT1G01010.1"; exon_number "6"; gene_name "NAC001"; ref_gene_id "AT1G01010"; 

1   StringTie   transcript  3651    5899    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; 

1   StringTie   exon    3651    3913    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "1"; 

1   StringTie   exon    3996    4276    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "2"; 

1   StringTie   exon    4506    4605    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "3"; 

1   StringTie   exon    4706    5095    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "4"; 

1   StringTie   exon    5174    5326    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "5"; 

1   StringTie   exon    5439    5899    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.2"; exon_number "6"; 

1   StringTie   transcript  3657    5899    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; 
1   StringTie   exon    3657    3913    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "1"; 

1   StringTie   exon    3996    4276    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "2"; 

1   StringTie   exon    4486    5095    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "3"; 

1   StringTie   exon    5174    5326    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "4"; 

1   StringTie   exon    5439    5899    1000    +   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.3"; exon_number "5";
RNA-Seq Assembly stringtie-merge • 1.1k views
ADD COMMENT
1
Entering edit mode
23 months ago
husensofteng ▴ 310

The two extra ones are potentially novel transcripts according to StringTie. However, StringTie tries to assemble all reads in each loci: when a known transcript can explain the reads it just reports the known transcript. Whereas in cases of remaining reads, it tries to assemble other transcript forms and reports them as potentially novel.

Therefore, you need to investigate the results properly. I suggest that you load the bam file and the generated GTF into the IGV browser and visualize the read counts for each of the transcripts.

In order to limit the results to the canonical transcripts given in your ref_ann.gff, run it with -e option which forces StringTie to ignore generating novel transcripts.

ADD COMMENT
0
Entering edit mode

Many thanks for the explanation, can we also investigate novel transcripts based on their TPM/FPKM value across samples that generated as a result of the second assembly instead of loading bam and gtf file in igv?

ADD REPLY
0
Entering edit mode

yes sure, but remember the expression values are for the whole transcript. Also, stringTie uses a maximum flow algorithm to use all sequencing reads therefore the alternative transcripts are not necessarily real ones and hence further validation is needed. May be you also want to consider exon-centric analysis (read on featureCounts -f exons, and DEXSeq) if your goal is to further investigate alternative transcripts.

ADD REPLY

Login before adding your answer.

Traffic: 1822 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6