Question: Regarding getting gene count tables for DeSeq from StringTie using prepDE.py
1
gravatar for pixie@bioinfo
2.1 years ago by
pixie@bioinfo1.4k
pixie@bioinfo1.4k wrote:

Hello, I followed the pipeline for StringTie and prepDe.py as given exactly from the ballgown directory created as given in http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#de

However, majority of the transcript IDs given in the StringTie merged file are not present in the gene count tables, They are mostly StringTie IDs (MSTRG). Will it be just okay to take those from the merged file and replace the Stringtie ids in the count matrix ?

This is my merged file:

chr01 StringTie transcript 2983 10815 1000 + 0 gene_id "MSTRG.1" transcript_id "Os01t0100100-01" ref_gene_id "Os01g0100100" chr01 StringTie exon 2983 3268 1000 + 0 gene_id "MSTRG.1" transcript_id "Os01t0100100-01" exon_number "1" chr01 StringTie exon 3354 3616 1000 + 0 gene_id "MSTRG.1" transcript_id "Os01t0100100-01" exon_number "2" chr01 StringTie exon 4357 4455 1000 + 0 gene_id "MSTRG.1" transcript_id "Os01t0100100-01" exon_number "3"

This is my gene count matrix for all the samples:

MSTRG.1 41 86 143 167 304 343 46 51 170 320 44 69 167 102 129 311 310 114 97 301 305 25 62 MSTRG.10 9 6 4 3 6 31 2 4 3 6 3 2 36 2 2 17 11 2 1 5 6 2 6 MSTRG.100 8 13 10 14 14 18 5 4 8 11 0 0 0 0 2 0 6 0 0 4 2 0 0

rna-seq stringtie • 2.1k views
ADD COMMENTlink modified 8 months ago by kashifahmad7500 • written 2.1 years ago by pixie@bioinfo1.4k
1

I had the same problem, however it is solved for version 1.3.3, merge is not necessary anymore.

ADD REPLYlink written 21 months ago by Buffo1.7k

Thanks for the input, I am planning to re-run using 1.3.3 version

ADD REPLYlink written 21 months ago by pixie@bioinfo1.4k

Hi, unfortunately, my issue is not resolved with version 1.3.3. I ran the program, once to create the gtf files, once to merge and once for the ballgown outputs. I then ran PrepDE.py on the ballgown folder to get the gene count matrix. I still get StringTie IDs mostly. Is there a way I can map back the IDs ? I will be grateful if I can email you a part of my data and you could have a look ?

ADD REPLYlink written 21 months ago by pixie@bioinfo1.4k

Hello saeed brother, how you did the next step after the 6| Estimate transcript abundances and create table counts for Ballgown, and switched to DEseq. kindly guide me. i am very new to this work. thanks

ADD REPLYlink written 8 months ago by kashifahmad7500

This is probably more appropriate as a new question.

ADD REPLYlink written 8 months ago by WouterDeCoster40k
0
gravatar for Satyajeet Khare
2.1 years ago by
Satyajeet Khare1.4k
Pune, India
Satyajeet Khare1.4k wrote:

You can just change the line 25 of prepDE.py from

RE_GENE_ID=re.compile('gene_id "([^"]+)"')

to

RE_GENE_ID=re.compile('transcript_id "([^"]+)"')

And then run prepDE.py. But I am not sure why not generate a transcript count matrix using prepDE as follows...

prepDE.py -i ballgown -g gene_count_matrix.csv -t transcript_count_matrix.csv

transcript_count_matrix.csv should have the transcript IDs.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Satyajeet Khare1.4k

Thank you for the reply. What I had meant was, most of my gene IDs are missing in the gene count tables and are replaced with the StringTie IDs (MSTRG). When I looked into the merged gtf file, I saw that the gene ids (of the transcripts) are not present in the count table, rather the MSTRG IDs are given. I am interested in gene-level analysis only as of now, hence need the gene counts. Can I just map it from the merged gtf file ?

ADD REPLYlink written 2.1 years ago by pixie@bioinfo1.4k

I am a little confused. MSTRG.1 is a gene id and it is present in your GTF file as well as gene count matrix. Do you mean ref_gene_id? In that case you can replace line 25 with this one...

RE_GENE_ID=re.compile('ref_gene_id "([^"]+)"')
ADD REPLYlink written 2.0 years ago by Satyajeet Khare1.4k
0
gravatar for Saeed
20 months ago by
Saeed0
Saeed0 wrote:

Hi everyone,

I followed pipeline stringtie then DESeq2 for DE Gene and it is working well. I was wondering, is that possible use transcript_count_matrix.csv to do DE Isoform (alternative splicing) with DESeq2?

ADD COMMENTlink written 20 months ago by Saeed0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1450 users visited in the last hour