RNA-SeQC v1.1.8 transcript_id attribute was not found
0
0
Entering edit mode
5.2 years ago
shintzen ▴ 30

Hi, I am running RNA-SeQC on some files with Ensembl. I know Ensembl format is "malformed" to this tool but I have followed the args to try to provide a different type of id and I still am not making progress.

java -jar RNA-SeQC_v1.1.8.jar -ttype "gene_biotype"  -r rna-seq/hg38_nochr.fa -t rna-seq/P001/Homo_sapiens.GRCh38.84.gtf -o RNASEQC_out -s rna-seq/P001/RNASeQC_file_P001.txt
RNA-SeQC v1.1.8.1 07/11/14
Creating rRNA Interval List based on given GTF annotations
Retriving contig names from reference
         contig names in reference: 455
Loading GTF for Read Counting
The required transcript_id attribute was not found on line 1    havana  gene    11869   14409   .       +       .       gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2";

I know all my files have the same chromosomes: Bams:

samtools view -H rna-seq/P001/Sample.HISAT2-2.1.0.aligned.sorted.bam                                                       @HD     VN:1.0  SO:coordinate
@SQ     SN:1    LN:248956422
@SQ     SN:10   LN:133797422
@SQ     SN:11   LN:135086622
@SQ     SN:12   LN:133275309
@SQ     SN:13   LN:114364328

GTF:

 grep -w "rRNA"  rna-seq/Homo_sapiens.GRCh38.84.gtf | head
1       ensembl gene    9437669 9437778 .       -       .       gene_id "ENSG00000252956"; gene_version "1"; gene_name "RNA5SP40"; gene_source "ensembl"; gene_biotype "rRNA";
1       ensembl transcript      9437669 9437778 .       -       .       gene_id "ENSG00000252956"; gene_version "1"; transcript_id "ENST00000517147"; transcript_version "1"; gene_name "RNA5SP40"; gene_source "ensembl"; gene_biotype "rRNA"; transcript_name "RNA5SP40-201"; transcript_source "ensembl"; transcript_biotype "rRNA"; tag "basic"; transcript_support_level "NA";

.fa

 head rna-seq/hg38_nochr.fa
>1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

I have checked my text file format as well:

Sample ID       Bam File        Notes
1_Sample_S1   /home/shintzen/rna-seq/P001/1_Sample.HISAT2-2.1.0.aligned.sorted.bam   1
2_Sample2_S2   /home/shintzen/rna-seq/P001/2_Sample2.HISAT2-2.1.0.aligned.sorted.bam   2

Am I missing something?

RNA-Seq QC • 2.1k views
ADD COMMENT
0
Entering edit mode

Hello, Did you solve the error given above? I am also facing the same error with RNA-SeQC.

Any help is appreciated.

ADD REPLY
0
Entering edit mode

Though I haven't used RNA-SeQC ever, by looking at the reported error The required transcript_id attribute was not found on line 1 .... it seems the tool is looking for the transcript_id attribute and your input .gtf file is having gene records at first line (probably that could be the master gene record).

I think excluding such gene records may help you in running this tool or I am sure there must be transcripts.gtf file available you can use that instead of gene.gtf.

ADD REPLY
0
Entering edit mode

The ENSEMBL gtf files provide a "gene" entry without a transcript_id. This is a violation of the GTF standards. You can remove each line having "gene" at position 3.

ADD REPLY

Login before adding your answer.

Traffic: 2630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6