weird error while running rmats on fastq files using hg38 genome
13 months ago
Sara ▴ 220

I am trying to run rmats (alternative splicing tool) using fastq files as input files using the following command: --s1 /files/s1.txt --s2 /files/s2.txt --gtf /files/rmats_analysis/gencode.v39.annotation.gtf --bi /files/STAR/hg38/ -t paired --readLength 50 --nthread 4 --od /files --tmp /files/

the gtf file I am using for the analysis is:


and the genome (fasta file) I used is:


so I used exactly the same version of genome and gtf file for this analysis. but I am getting this error:

Jan 06 00:40:19 ..... started STAR run
Jan 06 00:40:19 ..... loading genome
Jan 06 00:42:02 ..... processing annotations GTF

Fatal INPUT FILE error, no valid exon lines in the GTF file: /files/gencode.v39.annotation.gtf
Solution: check the formatting of the GTF file. One likely cause is the difference in chromosome naming between GTF and FASTA file.

Jan 06 00:42:18 ...... FATAL ERROR, exiting
Traceback (most recent call last):
  File "/usr/local/bin/", line 595, in <module>
  File "/usr/local/bin/", line 558, in main
    args = get_args()
  File "/usr/local/bin/", line 203, in get_args
    args.b1, args.b2 = doSTARMapping(args)
  File "/usr/local/bin/", line 81, in doSTARMapping
    raise Exception()

since I used the same version of genome and gtf file and both from GENCODE, would you please let me know how to fix this issue? I checked both of them and in both chromosomes name start with chr.

13 months ago
jv ▴ 780

it looks like you downloaded the transcript sequence fasta file instead of the genome sequence fasta file. The transcript fasta file will not have the same sequence name identifiers as those in the gene annotation GFF3 or GTF files which use chromosome sequence names.

Instead use either the Genome sequence primary assembly or Genome sequence (GRCh38.p13) fasta files for your reference sequence.


