Cufferge IOError: Errno 2 no such file or directory
1
0
Entering edit mode
5.4 years ago
Yuka Takemon ▴ 40

Hello,

The following is my script:

#!/bin/bash -l

#PBS -l nodes=1:ppn=20,walltime=24:00:00

##define directories and other variables

#dir to reference annotation

dir_ref_annotation=/pathto/genome_index/Mus_musculus/UCSC/mm10/Annotation/Genes

#dir to DNA seq for reference

dir_ref_genome_seq=/pathto/genome_index/Mus_musculus/UCSC/mm10/Sequence/Chromosomes

cuffmerge -o ${dir_cufflinks}/cuffmerge -g${dir_ref_annotation}/genes.gtf -s ${dir_ref_genome_seq}/*.fa -p 20${dir_cufflinks}/all_transcripts.txt 

I want to note that:

all_transcripts.txt lists .gtf files that came out of cufflinks

-g genes.gtf was previously used with cufflinks as a reference annotation

-s contains individual Chr*.fa  files

however I am getting the following error:

[Mon Dec 14 17:33:45 2015] Beginning transcriptome assembly merge

-------------------------------------------

[Mon Dec 14 17:33:45 2015] Preparing output location /pathto/Annotation/cufflinks_preqc/cuffmerge/

Traceback (most recent call last):

File "/opt/compsci/cufflinks/2.2.1/cuffmerge", line 580, in <module>

sys.exit(main())

File "/opt/compsci/cufflinks/2.2.1/cuffmerge", line 538, in main

gtf_input_files = test_input_files(transfrag_list_file)

File "/opt/compsci/cufflinks/2.2.1/cuffmerge", line 268, in test_input_files

g = open(line,"r")

IOError: [Errno 2] No such file or directory: '>chr11'

Is there something obvious I am missing here? Any input/help is appretiated

RNA-Seq Cuffmerge Errno 2 • 3.2k views
3
Entering edit mode
5.4 years ago
DG 7.2k

I believe usually with cuffmerge when passing a directory for the reference (-s) you just pass the directory name. You are passing /path/to/ref_dir/*.fa. Cuffmerge won't try and expand that to the various fasta files. Just pass -s /path/to/ref_dir/ as the parameter.

0
Entering edit mode

Thanks Dan! It looks like it ran, with a new merged.gtf file. But I'm currious if you can help me understand the warning I got:

Warning: cannot find genomic sequence file /pathto/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/chr1_GL456221_random{.fa,.fasta}


for each chromosome, each with /chrX_GLXXXX_random{.fa,.fasta} suffix. Is this something I can ignore?

0
Entering edit mode

Do you have the those contigs in the directory as FASTA files? My guess would be that they are in your transcriptome GTF file but that you don't have FASTA sequences for them in your directory. Its probably best to have them, although in my experience there isn't much in the way of known genes/protein-coding transcripts on these contigs so it probably won't have a huge impact on your analysis but it is always better to be more complete.

0
Entering edit mode

I dug around some more and noticed that the chrX_GLXX_random appears in my .gtf that came out of cufflinks, but not in the reference .gtf. So this is most likely the issue I am having here.

Thanks for you help!