Question: rnaseqc.v2.3 - Doesn't detect exons in GTF
gravatar for Jon17
11 months ago by
Jon1720 wrote:

RNASeqc doesn't like any of my GTF files. I have no problem running the example dataset. Any idea why? When I run this:

rnaseqc.v2.3.4.linux gencode.v29.compressed-noGene.gtf NuGen_mRNA_10_ERCC2_2_md.bam  OUTPUT

I get this result:

There were either no genes or no exons in the GTF
0 genes parsed
334017 exons parsed

I have a freshly downloaded gencode gtf file which was treated with gtx-pipeline

python3 gtex-pipeline/gene_model/ Homo_sapiens.GRCh38.97.gtf Homo_sapiens.GRCh38.97_compressed.gtf

and to remove genes and just keep exons:

awk -F'\t' '($3 != "gene")' gencode.v29.compressed.gtf > gencode.v29.compressed-noGene.gtf

And 334017 exons were confirmed:

cat gencode.v29.compressed-noGene.gtf | awk -F'\t' '{print $3}' | grep -c exon

So why can't it run?

The bam file processes fine with the older version of RNA seq:

java -jar RNA-SeQC_v1.1.8.jar -n 10000 -s "NuGen_mRNA_10_ERCC1_2|R3e_MAP_READS/NuGen_mRNA_10_ERCC1_2_md.bam|Desc" -t Homo_sapiens.GRCh38.83_filtered_transcript_id.gtf -r Homo_sapiens.GRCh38.dna.primary_assembly.fa -o OUT

But I can't get either the Ensemble GTF or gencode GTF to work with the newer version of RNASeqc. So I'm using gencode GTF and now running into this issue. At least the software doesn't crash. Anyone with RNASeqc experience have advice on how I can get this to run?

When I try to compress an Ensemble format:

python3 gtex-pipeline/gene_model/ Homo_sapiens.GRCh38.83_filtered_transcript_id.gtf Homo_sapiens.GRCh38.83_collapsed.gtf

I get this error:

Traceback (most recent call last):   File "SOFTWARE/gtex-pipeline/gene_model/", line 249, in <module>
    annotation = Annotation(args.transcript_gtf)   File "SOFTWARE/gtex-pipeline/gene_model/", line 76, in __init__
    t = Transcript(attributes.pop('transcript_id'), attributes.pop('transcript_name'), attributes.pop('transcript_type'), g, start_pos, end_pos) KeyError: 'transcript_type'
gencode rna-seq qc mit • 529 views
ADD COMMENTlink modified 11 months ago by RamRS27k • written 11 months ago by Jon1720

Are the chromosomal identifiers matching in all files that you are using?

ADD REPLYlink written 11 months ago by genomax85k

I think so. I've ran the bam file against both the original gencode gtf file and a modified version where I converted the "chr2" to just "2". Basically just removing all of the "chr" from the 1st column. I still have issues.

Legend for the text below:
Column 1 = Bam file (samtools view | awk -F'\t' '{print $3}')
Column 2 = modified GTF file ( awk -F'\t' '{print $1}')
Column 3 = original GTF file ( awk -F'\t' '{print $1}')

Easy to read screenshot

enter image description here

Text version below

 *  1   chr1
1   10  chr10
10  11  chr11
11  12  chr12
12  13  chr13
13  14  chr14
14  15  chr15
15  16  chr16
16  17  chr17
17  18  chr18
18  19  chr19
19  2   chr2
2   20  chr20
20  21  chr21
21  22  chr22
22  3   chr3
3   4   chr4
4   5   chr5
5   6   chr6
6   7   chr7
7   8   chr8
8   9   chr9
9   M   chrM
GL000008.2  X   chrX 
GL000009.2  Y   chrY
ADD REPLYlink modified 11 months ago • written 11 months ago by Jon1720

I think I solved the situation.

Still investigating but at least I'm getting output now. Will post an update later.

ADD REPLYlink modified 11 months ago • written 11 months ago by Jon1720

Hi, can you please explain how you solved the situation?

ADD REPLYlink written 6 months ago by maegsul150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 815 users visited in the last hour