Question: stringtie merged gtf doesn't give any gene expression columns
0
gravatar for Biologist
11 months ago by
Biologist150
Biologist150 wrote:

Hi,

I'm using hisat2 for aligning reads to genome. stringtie for quantification steps. Used stringtie --merge for assembling all samples gtf. The stringtie_merged.gtf doesn't have any FPKM, TPM or coverage columns which are present in sample gtf files. Is there a way to get all those columns with stringtie --merge?

And why some transcripts which are present in sample gtf files missing in stringtie_merged.gtf file?

Thank you !!

hisat2 rna-seq merge stringtie • 886 views
ADD COMMENTlink modified 11 months ago by Kevin Blighe41k • written 11 months ago by Biologist150
0
gravatar for Kevin Blighe
11 months ago by
Kevin Blighe41k
Guy's Hospital, London
Kevin Blighe41k wrote:

Please take a look at my previous answer here: transcript count after string tie merge

Both my answer and the one on the cross-posting on SeqAnswers independently corroborate each other.

In addition, the reason why some of your transcripts are missing in the merged GTF is likely because they fail one of the filter criteria for the merge process. Please refer to the helpful StringTie manual, to which I link in my answer.

Kevin

ADD COMMENTlink written 11 months ago by Kevin Blighe41k

For Stringtie --merge this is what I see.

Transcript merge usage mode: 
  stringtie --merge [Options] { gtf_list | strg1.gtf ...}
With this option StringTie will assemble transcripts from multiple
input files generating a unified non-redundant set of isoforms. In this mode
the following options are available:
  -G <guide_gff>   reference annotation to include in the merging (GTF/GFF3)
  -o <out_gtf>     output file name for the merged transcripts GTF
                    (default: stdout)
  -m <min_len>     minimum input transcript length to include in the merge
                    (default: 50)
  -c <min_cov>     minimum input transcript coverage to include in the merge
                    (default: 0)
  -F <min_fpkm>    minimum input transcript FPKM to include in the merge
                    (default: 1.0)
  -T <min_tpm>     minimum input transcript TPM to include in the merge
                    (default: 1.0)
  -f <min_iso>     minimum isoform fraction (default: 0.01)
  -g <gap_len>     gap between transcripts to merge together (default: 250)
  -i               keep merged transcripts with retained introns; by default
                   these are not kept unless there is strong evidence for them
  -l <label>       name prefix for output transcripts (default: MSTRG)
ADD REPLYlink modified 11 months ago • written 11 months ago by Biologist150
1

Yes, that is also what I saw when I looked. One or more of these parameters is causing your individual samples' transcripts to be excluded from the merged GTF. I am not to know which one (or more than one) of these is affecting your particular data. You should become your own investigator and begin to explore your own data and by modifying these parameters.

Also, as per my other answer and the answer on SeqAnswers, once you obtain your merged GTF, you then re-run StringTie with the merged GTF for the purposes of obtaining the read count abundances

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe41k

Thank you very much

ADD REPLYlink written 11 months ago by Biologist150

Okay - best of luck with it. I would start by looking a the transcripts that were excluded and to see how they could meet any one of these exclusion criteria. If a transcript is so lowly expressed, it may actually just be transcriptional 'noise' and be virtually functionless.

ADD REPLYlink written 11 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1849 users visited in the last hour