Question

RSEM-The reference contains no transcripts

1

Entering edit mode

3.4 years ago

crispin.hiley ▴ 10

Hi,

I would be very grateful for help. I have a problem similar to this:

RSEM: The reference contains no transcripts!

I am trying to run RSEM-prepare-reference with my own custom gtf file.:

SCRATCH_DIR=/camp/project/scratch/hileyc

REFERENCE_DIR="${SCRATCH_DIR}/reference"

ASSETS_DIR="${SCRATCH_DIR}/RNA"

RSEM_DIR="${SCRATCH_DIR}/rsem/RNA_output"


singularity run  -B "${SCRATCH_DIR}:${SCRATCH_DIR}" -W "${RSEM_DIR}" \

docker://slab/rsem  \

rsem-prepare-reference \

--gtf "${ASSETS_DIR}/RNA.gtf" \

  --star \

 --star-sjdboverhang 74 \

 --num-threads 8 \

"${REFERENCE_DIR}/hg19.fa" \

 RNA

but it get the following error:

rsem-extract-reference-transcripts eRNA 0 /camp/project/proj-tracerx-lung/txscratch/hileyc/eRNA/eRNA.gtf None 0 /camp/project/proj-tracerx-lung/txscratch/hileyc/reference/hg19.fa
Parsed 200000 lines
Parsed 400000 lines
Parsed 600000 lines
The reference contains no transcripts!
"rsem-extract-reference-transcripts RNA 0 /camp/project/scratch/hileyc/RNA/RNA.gtf None 0 /camp/project/scratch/hileyc/reference/hg19.fa" failed! Plase check if you provide correct parameters/options for the pipeline!

this is what the first few lines of my gtf look like:

chr1    L_etal      CDS     751480  751481  .       +       .       gene_id "RNA751480"; transcript_id "RNA751480+";
chr1    L_etal      CDS     751690  751691  .       +       .       gene_id "RNA751690"; transcript_id "RNA751690+";
chr1    L_etal      CDS     752240  752241  .       +       .       gene_id "RNA752240"; transcript_id "RNA752240+";

I have tried changing CDS to 'transcript' and other terms. The chromosome labels are the same.

Im sure its an incompatibly between by gft and the reference fasta file but i cant figure it out.

any ideas appreciated.

RNA-Seq • 1.6k views

ADD COMMENT • link 3.4 years ago by crispin.hiley ▴ 10

1

Entering edit mode

I am guessing that the GTF is the issue - it is missing transcript-to-gene information. There is detailed info here: https://deweylab.github.io/RSEM/README.html#built

A complete record could look something like

chr1    HAVANA  gene    57598   64116   .       +       .       gene_id "ENSG00000240361.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; level 2; hgnc_id "HGNC:31276"; havana_gene "OTTHUMG00000001095.3";

chr1    HAVANA  transcript      57598   64116   .       +       .       gene_id "ENSG00000240361.2"; transcript_id "ENST00000642116.1"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; transcript_type "lncRNA"; transcript_name "OR4G11P-202"; level 2; hgnc_id "HGNC:31276"; tag "RNA_Seq_supported_partial"; tag "basic"; havana_gene "OTTHUMG00000001095.3"; havana_transcript "OTTHUMT00000492680.1";
chr1    HAVANA  exon    57598   57653   .       +       .       gene_id "ENSG00000240361.2"; transcript_id "ENST00000642116.1"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; transcript_type "lncRNA"; transcript_name "OR4G11P-202"; exon_number 1; exon_id "ENSE00003812686.1"; level 2; hgnc_id "HGNC:31276"; tag "RNA_Seq_supported_partial"; tag "basic"; havana_gene "OTTHUMG00000001095.3"; havana_transcript "OTTHUMT00000492680.1";
chr1    HAVANA  exon    58700   58856   .       +       .       gene_id "ENSG00000240361.2"; transcript_id "ENST00000642116.1"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; transcript_type "lncRNA"; transcript_name "OR4G11P-202"; exon_number 2; exon_id "ENSE00003812505.1"; level 2; hgnc_id "HGNC:31276"; tag "RNA_Seq_supported_partial"; tag "basic"; havana_gene "OTTHUMG00000001095.3"; havana_transcript "OTTHUMT00000492680.1";
chr1    HAVANA  exon    62916   64116   .       +       .       gene_id "ENSG00000240361.2"; transcript_id "ENST00000642116.1"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; transcript_type "lncRNA"; transcript_name "OR4G11P-202"; exon_number 3; exon_id "ENSE00003811818.1"; level 2; hgnc_id "HGNC:31276"; tag "RNA_Seq_supported_partial"; tag "basic"; havana_gene "OTTHUMG00000001095.3"; havana_transcript "OTTHUMT00000492680.1";

chr1    HAVANA  transcript      62949   63887   .       +       .       gene_id "ENSG00000240361.2"; transcript_id "ENST00000492842.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; transcript_type "transcribed_unprocessed_pseudogene"; transcript_name "OR4G11P-201"; level 2; transcript_support_level "NA"; hgnc_id "HGNC:31276"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000001095.3"; havana_transcript "OTTHUMT00000003224.3";
chr1    HAVANA  exon    62949   63887   .       +       .       gene_id "ENSG00000240361.2"; transcript_id "ENST00000492842.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "OR4G11P"; transcript_type "transcribed_unprocessed_pseudogene"; transcript_name "OR4G11P-201"; exon_number 1; exon_id "ENSE00001830178.2"; level 2; transcript_support_level "NA"; hgnc_id "HGNC:31276"; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; havana_gene "OTTHUMG00000001095.3"; havana_transcript "OTTHUMT00000003224.3";

I have intentionally added spaces. Here, the gene is OR4G11P, and it has 2 transcripts, the first with 3 exons and the second with a single exon.

For your [assuming] eRNAs, you may simply need an extra line with 'gene' in place of 'CDS' ?

ADD REPLY • link 3.4 years ago by Kevin Blighe 87k

0

Entering edit mode

thanks Kevin that is very kind of you to reply,

yes eRNA

so for each eRNA (as there can be expression from both strands something like this:

chr1    L_etal      gene            751480  751481  .       .       .       gene_id "RNA751480"; 
chr1    L_etal      transcript     751480  751481  .       +       .       gene_id "RNA751480"; transcript_id "RNA751480+"
chr1    L_etal      transcript     751480  751481  .       -       .       gene_id "RNA751480"; transcript_id "RNA751480-"

ADD REPLY • link 3.4 years ago by crispin.hiley ▴ 10

0

Entering edit mode

dont think this is the solution unfortunately. is the problem that I am trying to align to single base pair?

ADD REPLY • link 3.4 years ago by crispin.hiley ▴ 10

0

Entering edit mode

I had not noticed that. How is it that these are 1 bp? They may fail some filter for minimum transcript length?

ADD REPLY • link 3.4 years ago by Kevin Blighe 87k