I'm running into an issue where certain transcripts present in my BAM files are not appearing in the GTF output from StringTie. This causes errors when I try to generate transcript count matrices using prepDE.py, as it cannot locate these missing transcripts in some samples.
I’m using StringTie to assemble transcripts and quantify expression from sorted BAM files, with the -G option pointing to a comprehensive GTF annotation file (from Gencode). For transcript quantification, I run StringTie with the parameters:
stringtie -e $SORTED_BAM_FILE -o ${SAMPLE_NAME}.gtf -p $NUM_THREADS -G $GTF_FILE -A abundances.tab -C cov_refs.gtf -B
When I run prepDE.py3 / prepDE.py with the output GTF files, I encounter errors like:
Error: could not locate transcript ENST00000697250.1 entry for sample OPL_B
Are there specific StringTie parameters that would help ensure more consistent detection of transcripts across samples? Is there a recommended approach for cases where transcripts appear in BAM files but are missing in StringTie’s GTF output, especially for downstream differential expression analysis with prepDE.py?
Any insights or suggested settings would be much appreciated, as I’m aiming to achieve a comprehensive transcript/gene count matrix compatible with DESeq2.
Thank You
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
bzip2 1.0.8 h4bc722e_7 conda-forge
c-ares 1.34.2 heb4867d_0 conda-forge
ca-certificates 2024.8.30 hbcca054_0 conda-forge
htslib 1.21 h5efdd21_0 bioconda
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.3 h659f571_0 conda-forge
ld_impl_linux-64 2.43 h712a8e2_1 conda-forge
libcurl 8.10.1 hbbe4b11_0 conda-forge
libdeflate 1.21 h4bc722e_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libexpat 2.6.3 h5888daf_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc 14.2.0 h77fa898_1 conda-forge
libgcc-ng 14.2.0 h69a702a_1 conda-forge
libgomp 14.2.0 h77fa898_1 conda-forge
libmpdec 4.0.0 h4bc722e_0 conda-forge
libnghttp2 1.58.0 h47da74e_1 conda-forge
libsqlite 3.46.1 hadc24fc_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx 14.2.0 hc0a3c3a_1 conda-forge
libstdcxx-ng 14.2.0 h4852527_1 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libzlib 1.3.1 hb9d3cd8_2 conda-forge
ncurses 6.5 he02047a_1 conda-forge
openssl 3.3.2 hb9d3cd8_0 conda-forge
pip 24.2 pyh145f28c_1 conda-forge
python 3.13.0 h9ebbce0_100_cp313 conda-forge
python_abi 3.13 5_cp313 conda-forge
readline 8.2 h8228510_1 conda-forge
stringtie 2.2.3 h43eeafb_0 bioconda
tk 8.6.13 noxft_h4845f30_101 conda-forge
tzdata 2024b hc8b5060_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zstd 1.5.6 ha6fb4c9_0 conda-forge
Linux seribizon 5.15.0-1052-oracle #58-Ubuntu SMP Tue Feb 13 19:43:43 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Hi upretia and welcome :-)
To keep your posts readable we advice to make use of the code-formatting option wherever possible, it's the 101010 button on the top op the edit box. I've done it for you this time.