RSEM-The reference contains no transcripts
0
1
Entering edit mode
3.4 years ago

Hi,

I would be very grateful for help. I have a problem similar to this:

RSEM: The reference contains no transcripts!

I am trying to run RSEM-prepare-reference with my own custom gtf file.:

SCRATCH_DIR=/camp/project/scratch/hileyc

REFERENCE_DIR="${SCRATCH_DIR}/reference"

ASSETS_DIR="${SCRATCH_DIR}/RNA"

RSEM_DIR="${SCRATCH_DIR}/rsem/RNA_output"


singularity run  -B "${SCRATCH_DIR}:${SCRATCH_DIR}" -W "${RSEM_DIR}" \

docker://slab/rsem  \

rsem-prepare-reference \

--gtf "${ASSETS_DIR}/RNA.gtf" \

  --star \

 --star-sjdboverhang 74 \

 --num-threads 8 \

"${REFERENCE_DIR}/hg19.fa" \

 RNA

but it get the following error:

rsem-extract-reference-transcripts eRNA 0 /camp/project/proj-tracerx-lung/txscratch/hileyc/eRNA/eRNA.gtf None 0 /camp/project/proj-tracerx-lung/txscratch/hileyc/reference/hg19.fa
Parsed 200000 lines
Parsed 400000 lines
Parsed 600000 lines
The reference contains no transcripts!
"rsem-extract-reference-transcripts RNA 0 /camp/project/scratch/hileyc/RNA/RNA.gtf None 0 /camp/project/scratch/hileyc/reference/hg19.fa" failed! Plase check if you provide correct parameters/options for the pipeline!

this is what the first few lines of my gtf look like:

chr1    L_etal      CDS     751480  751481  .       +       .       gene_id "RNA751480"; transcript_id "RNA751480+";
chr1    L_etal      CDS     751690  751691  .       +       .       gene_id "RNA751690"; transcript_id "RNA751690+";
chr1    L_etal      CDS     752240  752241  .       +       .       gene_id "RNA752240"; transcript_id "RNA752240+";

I have tried changing CDS to 'transcript' and other terms. The chromosome labels are the same.

Im sure its an incompatibly between by gft and the reference fasta file but i cant figure it out.

any ideas appreciated.

RNA-Seq • 1.6k views
ADD COMMENT
1
Entering edit mode

I am guessing that the GTF is the issue - it is missing transcript-to-gene information. There is detailed info here: https://deweylab.github.io/RSEM/README.html#built

A complete record could look something like

chr1    HAVANA  gene    57598   64116   .       +       .       gene_id "ENSG00000240361.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; level 2; hgnc_id "HGNC:31276"; havana_gene "OTTHUMG00000001095.3";

chr1    HAVANA  transcript      57598   64116   .       +       .       gene_id "ENSG00000240361.2"; transcript_id "ENST00000642116.1"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; transcript_type "lncRNA"; transcript_name "OR4G11P-202"; level 2; hgnc_id "HGNC:31276"; tag "RNA_Seq_supported_partial"; tag "basic"; havana_gene "OTTHUMG00000001095.3"; havana_transcript "OTTHUMT00000492680.1";
chr1    HAVANA  exon    57598   57653   .       +       .       gene_id "ENSG00000240361.2"; transcript_id "ENST00000642116.1"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; transcript_type "lncRNA"; transcript_name "OR4G11P-202"; exon_number 1; exon_id "ENSE00003812686.1"; level 2; hgnc_id "HGNC:31276"; tag "RNA_Seq_supported_partial"; tag "basic"; havana_gene "OTTHUMG00000001095.3"; havana_transcript "OTTHUMT00000492680.1";
chr1    HAVANA  exon    58700   58856   .       +       .       gene_id "ENSG00000240361.2"; transcript_id "ENST00000642116.1"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; transcript_type "lncRNA"; transcript_name "OR4G11P-202"; exon_number 2; exon_id "ENSE00003812505.1"; level 2; hgnc_id "HGNC:31276"; tag "RNA_Seq_supported_partial"; tag "basic"; havana_gene "OTTHUMG00000001095.3"; havana_transcript "OTTHUMT00000492680.1";
chr1    HAVANA  exon    62916   64116   .       +       .       gene_id "ENSG00000240361.2"; transcript_id "ENST00000642116.1"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; transcript_type "lncRNA"; transcript_name "OR4G11P-202"; exon_number 3; exon_id "ENSE00003811818.1"; level 2; hgnc_id "HGNC:31276"; tag "RNA_Seq_supported_partial"; tag "basic"; havana_gene "OTTHUMG00000001095.3"; havana_transcript "OTTHUMT00000492680.1";

chr1    HAVANA  transcript      62949   63887   .       +       .       gene_id "ENSG00000240361.2"; transcript_id "ENST00000492842.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; transcript_type "transcribed_unprocessed_pseudogene"; transcript_name "OR4G11P-201"; level 2; transcript_support_level "NA"; hgnc_id "HGNC:31276"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000001095.3"; havana_transcript "OTTHUMT00000003224.3";
chr1    HAVANA  exon    62949   63887   .       +       .       gene_id "ENSG00000240361.2"; transcript_id "ENST00000492842.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; transcript_type "transcribed_unprocessed_pseudogene"; transcript_name "OR4G11P-201"; exon_number 1; exon_id "ENSE00001830178.2"; level 2; transcript_support_level "NA"; hgnc_id "HGNC:31276"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000001095.3"; havana_transcript "OTTHUMT00000003224.3";

I have intentionally added spaces. Here, the gene is OR4G11P, and it has 2 transcripts, the first with 3 exons and the second with a single exon.

For your [assuming] eRNAs, you may simply need an extra line with 'gene' in place of 'CDS' ?

ADD REPLY
0
Entering edit mode

thanks Kevin that is very kind of you to reply,

yes eRNA

so for each eRNA (as there can be expression from both strands something like this:

chr1    L_etal      gene            751480  751481  .       .       .       gene_id "RNA751480"; 
chr1    L_etal      transcript     751480  751481  .       +       .       gene_id "RNA751480"; transcript_id "RNA751480+"
chr1    L_etal      transcript     751480  751481  .       -       .       gene_id "RNA751480"; transcript_id "RNA751480-"
ADD REPLY
0
Entering edit mode

dont think this is the solution unfortunately. is the problem that I am trying to align to single base pair?

ADD REPLY
0
Entering edit mode

I had not noticed that. How is it that these are 1 bp? They may fail some filter for minimum transcript length?

ADD REPLY

Login before adding your answer.

Traffic: 2660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6