Question: Execute commands in Bash Chunk in Rmarkdown
0
gravatar for M.O.L.S
6 months ago by
M.O.L.S10
M.O.L.S10 wrote:

Hi,

I am using RMarkdown in Rstudio and I want to execute commands from a program using a bash chunk

````{bash} 
```

I have a program called samtools on my computer so when I execute it in the chunk, it works.

```{bash}
samtools 
```

When I type :

```{bash}
which samtools 
```

The output is that it tells me samtools is located in the usr/local/bin directory.

However, when I execute a program with vcftools , I get an error because Rstudio, does not know where the program is: I have it in another directory on my computer.

How do I get Rstudio or Rmarkdown to execute vcftools from the bash chunk? Is there a way that I can tell RMarkdown which directory to look in to find the program?

For example (something like) :

```{bash}
$vcftools = /Users/m.o.l.s/Programs_For_Bioinformatics/vcftools

```

or would I have to move all of the programs to usr/bin/local?

Outside of Rstudio, I have made aliases to the programs so they work fine on the terminal.

I made the alias by writing in my .bash_profile

alias bcftools=/Users/paths/to/where/the/program/is/installed

but I added the path to vcftools to my export PATH in the bash profile completely.

ADD COMMENTlink modified 5 months ago • written 6 months ago by M.O.L.S10
2

My guess is that samtools and vcftools are not install using the same mechanism. If I were forced to say, I'd guess that samtools is installed in a machine default location, and vcftools in a non-standard location using something like conda or a module file.

ADD REPLYlink written 6 months ago by i.sudbery7.3k

Yes. This is true. I have installed programs using Conda, Homebrew, pip install, and downloaded binaries and source files from online and compiled them from github and websites.

ADD REPLYlink written 5 months ago by M.O.L.S10
2

How did you make the alias'? which configuration file are the stored in?

ADD REPLYlink written 6 months ago by i.sudbery7.3k

I made the alias by opening my hidden bash profile :

open .bash_profile

and then writing in it:

alias bcftools = /Users/paths/to/where/the/program/is/installed

ADD REPLYlink written 5 months ago by M.O.L.S10

use the fullpath to samtools ?

$ which samtools
ADD REPLYlink written 6 months ago by Pierre Lindenbaum127k
3
gravatar for i.sudbery
5 months ago by
i.sudbery7.3k
Sheffield, UK
i.sudbery7.3k wrote:

.bash_profile is executed when you start a login shell. My suspicion is that the bash R markdown cells are executed in a separate shell instance, that is not a login shell and so this is not executing your .bash_profile file when it starts.

It might be different with .bashrc, which might get executed when you start a bash shell in Rmarkdown (or it might not - for example, im pretty sure neither are run on an SGE submission script).

You could try explicitly adding source ~/.bash_profile to the start of your code chunk.

ADD COMMENTlink written 5 months ago by i.sudbery7.3k

Yes. This 100% works. Thank you so much =).

ADD REPLYlink written 5 months ago by M.O.L.S10
2
gravatar for h.mon
6 months ago by
h.mon29k
Brazil
h.mon29k wrote:

You can use the full path to the tool:

```{bash}
~/bin/bioinfotools/vcftools
```

Returns:

## 
## VCFtools (0.1.16)
## © Adam Auton and Anthony Marcketta 2009
## 
## Process Variant Call Format files
## 
## For a list of options, please go to:
##  https://vcftools.github.io/man_latest.html
## 
## Alternatively, a man page is available, type:
##  man vcftools
## 
## Questions, comments, and suggestions should be emailed to:
##  vcftools-help@lists.sourceforge.net

On a side note, it seems the RStudio / Rmarkdown bash engine is somewhat broken. This code block hangs at compilation:

```{bash in_path}
samtools view
```

Whereas at the command-line, it instantly returns the help:

Usage: samtools view [options] <in.bam>|<in.sam>|<in.cram> [region ...]

Options:
  -b       output BAM
  -C       output CRAM (requires -T)
  -1       use fast BAM compression (implies -b)
  -u       uncompressed BAM output (implies -b)
  -h       include header in SAM output
  -H       print SAM header only (no alignments)
  -c       print only the count of matching records
  -o FILE  output file name [stdout]
  -U FILE  output reads not selected by filters to FILE [null]
  -t FILE  FILE listing reference names and lengths (see long help) [null]
  -L FILE  only include reads overlapping this BED FILE [null]
  -r STR   only include reads in read group STR [null]
  -R FILE  only include reads with read group listed in FILE [null]
  -q INT   only include reads with mapping quality >= INT [0]
  -l STR   only include reads in library STR [null]
  -m INT   only include reads with number of CIGAR operations consuming
           query sequence >= INT [0]
  -f INT   only include reads with all  of the FLAGs in INT present [0]
  -F INT   only include reads with none of the FLAGS in INT present [0]
  -G INT   only EXCLUDE reads with all  of the FLAGs in INT present [0]
  -s FLOAT subsample reads (given INT.FRAC option value, 0.FRAC is the
           fraction of templates/read pairs to keep; INT part sets seed)
  -M       use the multi-region iterator (increases the speed, removes
           duplicates and outputs the reads as they are ordered in the file)
  -x STR   read tag to strip (repeatable) [null]
  -B       collapse the backward CIGAR operation
  -?       print long help, including note about region specification
  -S       ignored (input format is auto-detected)
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
  -O, --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
  -T, --reference FILE
               Reference sequence FASTA FILE [null]
  -@, --threads INT
               Number of additional threads to use [0]
ADD COMMENTlink written 6 months ago by h.mon29k

This seems to work but I don't want to use the full path to the tool during execution because it clutters the code. And if I decide to move the file out of the directory then the commands inside the file will not be executed. If the program is in the bin like in your example above, then I should be able to use a 1 word command.

ADD REPLYlink modified 5 months ago • written 5 months ago by M.O.L.S10
1
gravatar for M.O.L.S
5 months ago by
M.O.L.S10
M.O.L.S10 wrote:

In an RMarkdown file in Rstudio the following can be applied:

For jar files, I haven't figured out how to do it with a short command, but as h.mon says I can use the full path to the program and it will work. for example:

```{bash}
# The beagle program 
 java -jar /Users/m.o.l.s/Programs_For_Bioinformatics/beagle.19.jar
```

The aliases above work only for the execution of the command in the Terminal , not in the bash chunk. So this is great for working in the Terminal.

For example on Mac inside the Terminal:

open .bash_profile
alias bcftools=/Users/paths/to/where/the/program/is/installed
File  > Save >  click the red x 
hold down command and N together (to open a New terminal)
bcftools

These comands work to show the settings of the program when the RMarkdown file is inside the directory where the programs are installed. They show the commands of the program without using the ./ argument.

```{bash}
# The BCFtools program
$BCFTOOLS bcftools/bcftools
```

```{bash}
$PSIBLAST =blast_folder/bin/psiblast -h
```

... but they dont do anything after that. I though the $ held some great significance, but it doesn't seem to.

What does work however, as i.sudbery mentioned is putting all of the files in the .bashrc or .bash profile.

The first step in doing this is to go to the Terminal:

find the home directory

echo $HOME

The home direcotry is then the start of where these programs can be found

open .bashrc

Paste this into the bash_rc ( but change the home directory to the actual name of your home directory)

 exportPATH=§PATH:/bin:/usr/bin/:/usr/local/bin:/usr/sbin:/sbin:/Users/home/Programs_For_Bioinformatics/vcftools/bin:/Users/home/Programs_For_Bioinformatics/bowtie:/Users/home/Programs_For_Bioinformatics/Kalign:

The pattern above is to list the path where the program is found on the computer relative to the home directory and separate each path with a semicolon.

File > save > click the x

Then it will be possible to use the program in Rmarkdown using a one word command.

```{bash}
source ~/.bash_profile
cufflinks
```

or

```{bash}
source ~/.bashrc
blastp -h
```

in the start of the code chunk.

Of course, commands like this work but as the path gets longer the code gets longer

./angsd_folder/angsd

And it also works if the programs are located in the usr/local/bin

ADD COMMENTlink modified 5 months ago • written 5 months ago by M.O.L.S10

Please consider validating your past questions (including this one) that have received answers. Accepting answers and upvoting posts is appropriate way you acknowledge help you receive on biostars.

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLYlink written 5 months ago by genomax80k

I cant see how I can accept someone else's comment as an answer to my question. =( I can add a reply, add a comment, or moderate

ADD REPLYlink modified 5 months ago • written 5 months ago by M.O.L.S10

Which comment are you referring to? If you can point out the comment I will move it to an answer so you can accept it.

Edit: You were referring to @i.sudbery's comment. I have moved that to an answer so you can accept it.

ADD REPLYlink modified 5 months ago • written 5 months ago by genomax80k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1274 users visited in the last hour