R in python for dummies: Error in file(file, "rt") : cannot open the connection
1
0
Entering edit mode
8.0 years ago
lien ▴ 90

Dear all,

Another newbie question about running R from within a python script. I googled and found the error message multiple times, and I think I have to change something with the path, but I can't figure out how.

Here is the script: The first steps are going fine, but the problem occurs during step 5.

Below the command line used to invoke the script

  lien@lien:~/Documents/Plasma_Seq_Heitzer_Ulz$ pwd
        /home/lien/Documents/Plasma_Seq_Heitzer_Ulz
  lien@lien:~/Documents/Plasma_Seq_Heitzer_Ulz$ ./cnv_pipeline_totest.py 
-fq /media/lien/Seagate\ Backup\ Plus\ Drive/Lien-NIPT-Original-Fastq/GC029428-AR007.HiSeq2000.FCA.R1.fastq.gz 
-s GC029428-AR007 
-g f 
-o /home/lien/Documents/Plasma_Seq_output 
-m miseq 
-k

A temporary R file (GC029428-AR007.tmp.r) is created and shown below, and the directories all seem right.

 cbs.segment01(indir=".", 
 outdir="/home/lien/Documents/Plasma_Seq_output/GC029428-AR007/CGHResults", 
 bad.bins="/home/lien/Documents/Plasma_Seq_Heitzer_Ulz/ref/hg19.50k.k50.bad.bins.txt",
 varbin.gc="/home/lien/Documents/Plasma_Seq_Heitzer_Ulz/ref/hg19.new_sorted.gc_count.txt", 
 varbin.data="/home/lien/Documents/Plasma_Seq_output/GC029428-AR007/GC029428-AR007.bincounts", 
 sample.name="GC029428-AR007", alt.sample.name="", 
 alpha=0.05, nperm=1000, undo.SD=1.0, min.width=5,
 controls_file="/home/lien/Documents/Plasma_Seq_Heitzer_Ulz/ref/Kontrollen_female.bincount.txt",
 sample.dir="/home/lien/Documents/Plasma_Seq_output/GC029428-AR007")

However, the log file shows that files are not found. All libraries in R are loaded correctly, but this is the error I'm getting:

cbs.segment01(indir=".", 
outdir="/home/lien/Documents/Plasma_Seq_output/GC029428-AR007/CGHResults", 
bad.bins="/home/lien/Documents/Plasma_Seq_Heitzer_Ulz/ref/hg19.50k.k50.bad.bins.txt",
varbin.gc="/home/lien/Documents/Plasma_Seq_Heitzer_Ulz/ref/hg19.new_sorted.gc_count.txt", 
varbin.data="/home/lien/Documents/Plasma_Seq_output/GC029428-AR007/GC029428-AR007.bincounts", 
sample.name="GC029428-AR007", alt.sample.name="", 
alpha=0.05, nperm=1000, undo.SD=1.0, min.width=5,
controls_file="/home/lien/Documents/Plasma_Seq_Heitzer_Ulz/ref/Kontrollen_female.bincount.txt",
sample.dir="/home/lien/Documents/Plasma_Seq_output/GC029428-AR007")
Error in file(file, "rt") : cannot open the connection
Calls: cbs.segment01 -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
  cannot open file './/home/lien/Documents/Plasma_Seq_output/GC029428-AR007/GC029428-AR007.bincounts': No such file or directory
Execution halted

I think the error is caused by the .//Plasma_Seq_output folder that is not found due to the '//'. However, when I try to change something in the python script to remove one '/', this doesn't work. Also, when I try to change the command line arguments, this results in error.

Am I missing something really obvious here?

Thanks!

R python • 5.3k views
ADD COMMENT
0
Entering edit mode
8.0 years ago

In which directory do you execute the script? My understanding is that the script expects to be in "Documents".

ADD COMMENT
0
Entering edit mode

The script is located in /Documents/Plasma_Seq_Heitzer_Ulz/ and is executed in this directory. The input files in the reference-subfolder are in /Documents/Plasma_Seq_Heitzer_Ulz/ref/. The output files generated are in /Documents/Plasma_Seq_output/. Are this too many subfolders?

ADD REPLY
0
Entering edit mode

To solve your problem, try to type the full path of all your documents. Your error message is just saying it cannot detect the file. I think it is searching for /Documents/Plasma_Seq_Heitzer_Ulz///Plasma_Seq_output/GC029428-AR007/GC029428-AR007.bincounts

ADD REPLY
0
Entering edit mode

I've copied all the outputs without changing the paths. And I included the entire script, where the first steps are ok, but problems arise in step 5.

ADD REPLY
0
Entering edit mode

How about changing indir=\".\" to indir=\"\", hopefully it will change the file to //home/lien/Documents/Plasma_Seq_output/GC029428-AR007/GC029428-AR007.bincounts instead of .//home/lien/Documents/Plasma_Seq_output/GC029428-AR007/GC029428-AR007.bincounts. If that doesn't work, maybe even remove the indir parameter

ADD REPLY
0
Entering edit mode

I tried your suggestions:

*. When I change indir from indir=\".\" to indir=\"\" this is the error message:

Error in runCGHAnalysis(CGHweb_ratios, BioHMM = FALSE, UseCloneDists = FALSE, :
Unable to create the output directory /home/lien/Documents/Plasma_Seq_Heitzer_Ulz//home/lien/Documents/Plasma_Seq_output/GC029428-AR007/CGHResults Calls: cbs.segment01 -> runCGHAnalysis In addition: Warning message: In dir.create(tDir) : cannot create dir '/home/lien/Documents/Plasma_Seq_Heitzer_Ulz//home/lien/Documents/Plasma_Seq_output/GC029428-AR007/CGHResults', reason 'No such file or directory' Execution halted

*. When I completely remove indir this is the error message:

Error in runCGHAnalysis(CGHweb_ratios, BioHMM = FALSE, UseCloneDists = FALSE, : Unable to create the output directory /home/lien/Documents/Plasma_Seq_Heitzer_Ulz//home/lien/Documents/Plasma_Seq_output/GC029428-AR007/CGHResults Calls: cbs.segment01 -> runCGHAnalysis In addition: Warning message: In dir.create(tDir) : cannot create dir '/home/lien/Documents/Plasma_Seq_Heitzer_Ulz//home/lien/Documents/Plasma_Seq_output/GC029428-AR007/CGHResults', reason 'No such file or directory' Execution halted

*. When I change the output folder from /home/lien/Documents/Plasma_Seq_output to /home/lien/Documents/Plasma_Seq_Heitzer_Ulz, this is the error message (which is the same as the original one):

Error in file(file, "rt") : cannot open the connection Calls: cbs.segment01 -> read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file './/home/lien/Documents/Plasma_Seq_Heitzer_Ulz/GC029428-AR007/GC029428-AR007.bincounts': No such file or directory Execution halted

ADD REPLY
0
Entering edit mode

From these error message, we can be certain that the problem is due to the directory location. Therefore, to point to the correct location, you can try indir to point to proj_dir whereas remove proj_dir from the outdir hopefully, it will now point to /home/lien/Documents/Plasma_Seq_output/GC029428-AR007/CGHResults where I think proj_dir is /home/lien/Documents/Plasma_Seq_output/GC029428-AR007/

ADD REPLY
0
Entering edit mode

Thanks for your help Sam. It was indeed a problem with the directory locations. When I don't specify the output directory on the command line, he finds all the locations he needs and the script runs completely.

ADD REPLY

Login before adding your answer.

Traffic: 1528 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6