Question: Missing assembled transcripts after Stringtie merge
gravatar for Ziliang Luo
4 weeks ago by
Ziliang Luo0 wrote:

Hello, I'm using Stringtie v2.0.4 for my RNAseq data analysis. I have a reference genome and annotation file but I still want to identify novel genes since the annotation is not complete for my species. I used:

for f in *.bam; 
    echo ${f}; 
    stringtie -p 8 -f 0.3 -j 5 -G gene_models_main.gff3 -o ${f%.*}.gtf ${f}; 

to assemble for individual samples. Then used:

stringtie --merge -p 8 -G gene_models_main.gff3 -o stringtie_merged.gtf ./mergelist.txt

to get the nonredundant transcript gtf file.

I have a gene of interest and want to check the assembly by looking at that gene. I did see many reads mapped to that gene region and it's assembled in some samples (not all of them). However, the gene was missing after I merged all the individual samples. (see figure). Picture1

The merged assembly is on the top, and the rest are the individual assemblies. My target trasncript (start with STRG) is assembled in many samples but not in the merged file. Even more strange is that the close reference gene (arahyL5ZR7F) has no coverage and is not assembled in most samples, but it is there. I tried merging without any filtering criteria and without annotation file. But I cannot see that transcript in the merged file.

I find this issue at downstream analysis, and really don't want to redo the analysis using other assmblers. Can anyone help?

ADD COMMENTlink modified 29 days ago by h.mon30k • written 4 weeks ago by Ziliang Luo0

I'm not sure. Somehow the --merge option does not consider this a propper transcript (which I might also doubt as it is >1000 nt without any introns and does not appear to be identical in any samples?). Did you try running stringtie --merge with the "-i" option?

ADD REPLYlink written 27 days ago by kristoffer.vittingseerup3.4k

Hi Kristoffer,

The -i option didn't improve the result. I also notice that some samples assembled 2 separate fragments of this single exon gene. Maybe this inconsistency among samples "confuses" the stringtie?

Besides, seems thhe Stringtie strictly follows the provided reference. By using a updated reference gff file where this gene is correctly annotated, the assembly of this gene looks good. I'll try to use the Stringtie2 to redo the assembly.

ADD REPLYlink modified 27 days ago • written 27 days ago by Ziliang Luo0
