StringTie cannot be run as a problem in naming convention
2
0
Entering edit mode
6.2 years ago

Hi all, I know this question has been asked previously, however I still cannot solve my problem.

I have obtained both gtf and corresponding fasta files from Pseudomonas aeruginosa database.

However the StringTie cannot be run and ends with this error message:"WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences."

The header of the fasta and GTF files is as follow:

gi|116048575|ref|NC_008463|pseudocap|138 [Pseudomonas aeruginosa UCBPP-PA14 chromosome, complete genome.] TTTAAAGAGACCGGCGATTCTAGTGAAATCGAACGGGCAGGTCAATTTCCAACCAGCGATGACGTAATAGATAGATACAAGGAAGTCATTTTTCTTTTAAAGGATAG

chromosome PseudoCAP CDS 483 2027 . + 0 gene_id "PA14_00010"; transcript_id "1650836"; locus_tag "PA14_00010"; name "dnaA ,chromosomal replication initiation protein"; replicon_xref "NC_008463" chromosome PseudoCAP CDS 2056 3159 . + 0 gene_id "PA14_00020"; transcript_id "1650838"; locus_tag "PA14_00020"; name "dnaN ,DNA polymerase III subunit beta"; replicon_xref "NC_008463"

I will appreciate any help in advance

Nazanin Hosseinkhan

StringTie Gene naming convention GTF • 2.1k views
ADD COMMENT
0
Entering edit mode

If you are experiencing command line issues, you may be interested in DEWE (http://www.sing-group.org/dewe), a GUI to execute differential expression analyses that also allows you to use StringTie separately. Regards.

ADD REPLY
0
Entering edit mode
6.2 years ago
jean.elbers ★ 1.7k

You need to make sure that the GTF and FASTA files come from the same source to ensure compatible headers. It looks like the FASTA file has a very different header for chromosome than GTF file.

FASTA file seqname/chromosome

gi|116048575|ref|NC_008463|pseudocap|138 [Pseudomonas aeruginosa UCBPP-PA14 chromosome, complete genome.]

GTF file seqname/chromosome

chromosome

One thing you could do is manually change the FASTA header to >chromosome (if you don't want to write a regular expression to change the GTF file's contents). Note that this is assuming that the FASTA and GTF files are indeed from the exact same annotation run.

ADD COMMENT
0
Entering edit mode

Hi,

Thank u so much.

Can I instead change the name of chromosome in gtf file?

I have already carried out the alignment step

ADD REPLY
0
Entering edit mode
6.2 years ago
jean.elbers ★ 1.7k

I don't have a gtf file to test this one on StringTie, but this is how you would change chromosome to gi|116048575|ref|NC_008463|pseudocap|138 [Pseudomonas aeruginosa UCBPP-PA14 chromosome, complete genome.] in the first column throughout the gtf file.

awk -F'\t' -v OFS='\t' '{sub(/chromosome/, "gi\|116048575\|ref\|NC_008463\|pseudocap\|138 \[Pseudomonas aeruginosa UCBPP-PA14 chromosome, complete genome.\]", $1)} 1' name-of-gtf-file.gtf > name-of-new-gtf-file.gtf

I don't know if the BAM file truncated the FASTA header after the first space following "138", so here is another replacement string that might be required by StringTie

awk -F'\t' -v OFS='\t' '{sub(/chromosome/, "gi\|116048575\|ref\|NC_008463\|pseudocap\|138", $1)} 1' name-of-gtf-file.gtf > name-of-new-gtf-file.gtf
ADD COMMENT

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6