nf-core/rnaseq, problem with ensembl reference
0
0
Entering edit mode
5 weeks ago
Filip • 0

Dear community, I am running rna-seq pipeline from nfcore,

sudo nextflow run nf-core/rnaseq \
--input microsheet.csv \
--outdir rnaseq \
--skip_alignment \
--pseudo_aligner salmon \
--fasta references/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
--gtf references/ensembl/Homo_sapiens.GRCh38.111.gtf \
--transcript_fasta references/ensembl/Homo_sapiens.GRCh38.cdna.all.fa \
--max_memory 50GB \
--max_cpus 18 \
-profile docker \

And I got an error:

ERROR ~ Error executing process > 'NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TX2GENE Caused by:
Missing output file(s) *.tsv expected by process NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TX2GENE (Homo_sapiens.GRCh38.dna.primary_assembly.filtered.gtf)

Command error: __main__ - 2024-03-18 11:10:51,695 WARNING: No attribute in GTF matching transcripts __main__ - 2024-03-18 11:10:51,695 ERROR: Failed to map transcripts to genes.

My reference comes from ensembl, and upon checking the files I discovered that the .gtf file contains transcript_id like this: transcript_id "ENST00000511072" While my counts, spawned from transcriptome reference are named like this: ENST00000390469.2

I can't find gtf file from ensembl that contains the information about version (.1, .2 etc.). Could the version be causing the error? It is suprising that the pipeline doesn't check for this?

Any advise is much appreciated. Thank you

nfcore nextflow ensembl rnaseq • 288 views
ADD COMMENT
0
Entering edit mode

Since you are using salmon you should not need the GTF file. Can you try taking that out?

ADD REPLY
0
Entering edit mode

It is specified in the nfcore docs that I need it:

However, you can provide the --skip_alignment parameter if you would like to run Salmon or Kallisto in isolation. By default, the pipeline will use the genome fasta and gtf file to generate the transcripts fasta file, and then to build the Salmon index.

I tried running it to confirm and got:

No GTF or GFF3 annotation specified! The pipeline requires at least one of these files.

Having said that, I actually obtained the quant.sf files from salmon, it is the TX2GENE step that fails.

ADD REPLY

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6