I have 40 assembled file (1.gtf, 2.gtf, 3.gtf, .... , 40.gtf) of different tissue/sample of an organism, which was obtained by following new-tuxedo protocol. The Stringtie --merge will merge transcript from all samples (stringtie_merged.gtf) and provide a string_tie_unique_id corresponding to different transcript/exon. Then on running gffcompare, help me to examine how the transcripts (string_tie_unique_id) compare with the reference annotation. But during this process, the information about transcript (string_tie_unique_id) and its respective source .gtf file is lost. So what should i do to backtrace all transcript of string_tie_unique_id, by following this protocol?
Skipping Stringtie --merge, and running gffcompare on all 40 assembled file (1.gtf, 2.gtf, 3.gtf, .... , 40.gtf) generates assemble file (merged.combined.gtf) as well as other file such as .loci, .tracking etc,. These files help me track down all the transcripts of merged.combined.gtf, from which gtf file they have came. Is it okey to use this?
As i have to extract transcript sequence through 'gffread' using merged.gtf file and apply several filtering steps, also have to back-trace the remaining transcript sequence to the original sample .gtf file, i.e., from which tissue the remaining/filtered transcript sequence have came from.