Error message when implementing megaptera::stepC(x)
1
0
Entering edit mode
7.1 years ago
ecolonaut ▴ 100

I am attempting to work through the megaptera vignette walk through. The initial parts including database setup, feeding the database a taxonomic backbone [stepA(x)], and fetching the sequences for desired loci of given taxa [stepB(x)] work fine. Most of the stepC(x) is run given the print out in R, but I keep getting an error message pertaining to the mafft() call for alignment of conspecific sequences (see bottom line of code). When I call ips::mafft() directly the function works fine. My question is - what I can try to resolve the received error message?

>library(devtools); install_github("heibl/megaptera")
>library("megaptera")

># set up the postgresSQL database connection
>drv <- dbDriver("PostgreSQL")
>conn <- dbPars(dbname = "Cetacea", host = "localhost", port = 5432, user = "openpg", password = "new_user_password")
>show(conn)
###PostgreSQL connection parameters: 
###     host = localhost 
###     port = 5432 
###   dbname = cetacea 
###     user = openpg 
### password = new_user_password

# create taxonomic backbone
>tax <- taxon(ingroup = "Cetacea",
             outgroup = c("Sus scrofa"),
             kingdom = "Metazoa")
>tax

###--- megaptera taxon class ---
###ingroup     : Cetacea
###is extended : no
###outgroup    : Sus scrofa
###is extended : no
###in kingdom  : Metazoa
###hybrids     : excluded
###guide tree  : taxonomy-based

# set the gene loci of interest
>loci <- locus("cox1")
>loci
###Locus definition for cox1 
###kind                :  gene 
###search strings      :  cox1, COI, COX1, coi, Cox1 
###search fields       :  gene, title 
###use genomes         :  TRUE 
###SQL tables          :  acc_cox1, spec_cox1 
###alignment method    :  auto 
###minimum identity    :  0.75 
###minimum coverage    :  0.5

# set the function parameters
>pars <- megapteraPars()
>pars
MEGAPTERA pipeline parameters: 
###             parallel = FALSE 
###                 cpus = 0 
###         cluster.type = none 
###          update.seqs = all 
###               retmax = 500 
###      max.gi.per.spec = 100 
###               max.bp = 5000 
###   reference.max.dist = 0.25 
###   min.seqs.reference = 10 
###           fract.miss = 0.25 
###              filter1 = 0.5 
###              filter2 = 0.25 
###              filter3 = 0.05 
###              filter4 = 0.2 
###       block.max.dist = 0.5 
###            min.n.seq = 5 
###                  gb1 = 0.5 
###                  gb2 = 0.5 
###                  gb3 = 9999 
###                  gb4 = 2 
###                  gb5 = a

# define x to pass to step() functions
>x <- megapteraProj(db = conn,
                   taxon = tax,
                   locus = loci,
            align.exe = "C:/Users/Gregory/Programs/MAFFT/mafft",
            mask.exe = "C:/Users/Gregory/Programs/GBlocks/Gblocks")

# begin the pipeline with stepA
> stepA(x)
###megaptera 1.0-52 
###2017-03-21 18:29:15 
###STEP A: searching and downloading taxonomy from GenBank
###taxonomy already downloaded
###STEP A finished after 0.06 secs

# move on to stepB
>stepB(x)
###megaptera 1.0-52 
###2017-03-21 18:30:22 
###STEP B: searching and downloading sequences from GenBank
###...

# than stepC
>stepC(x)
###megaptera 1.0-52 
###2017-03-21 18:31:43 
###STEP C: alignment of conspecific sequences
### 72 species in table acc_cox1 
### 8 species have 1 accession 
### 64 species have > 1 accession
### 63 species are already aligned
### 1 species need to be aligned

###-- 9 seqs. of Delphinapterus_leucas
###**Error in mafft(seqs, method = "auto", path = megProj@align.exe) :** 
###  **unused argument (path = megProj@align.exe)**
R megaptera alignment MAFFT • 1.8k views
ADD COMMENT
0
Entering edit mode

I believe this may be an issue with the alignSpecies() function that is included towards the end of stepC(x). This function includes an external call to the ips::mafft() alignment wrapper, but doesn't seem to implement it correctly. Potentially this is because I'm using a Windows computer?

ADD REPLY
1
Entering edit mode
7.1 years ago

I don't know megaptera package, but the error message is very clear.

###**Error in mafft(seqs, method = "auto", path = megProj@align.exe) :** 
###  **unused argument (path = megProj@align.exe)**

This implies that the mafft() that you have got installed is not expecting a path argument. Try updating the ips package (or see if they have recommended version of ips package to go with megaptera)

PS: I'm also suspecting that the path that you are passing is wrong. From IPS vegnette (p15) https://cran.r-project.org/web/packages/ips/ips.pdf

path: A character string indicating the path to the MAFFT executable.

ADD COMMENT
0
Entering edit mode

Thank you for your suggestions. I forgot to include part of my code where x is defined, which is later used to define megProj@align.exe in the source code. I've updated both packages from the heibl github repositories, and in the newest version of ips the path argument has been replaced by exec to define the location of the executable. I think that might be the problem as the path argument seems to be automatically supplied by stepC(x) (source) and alignSpecies() (source) as part of the automated pipeline process, as in I can't omit it without changing the source code. I've lodged an issue request in the megaptera repository to this end.

Here is the source code referring to megProj@align.exe, which is the same as x@align.exe (defined above), as well as my session information:

> x@align.exe
[1] "C:/Users/Gregory/Programs/MAFFT/mafft"


# the source code:  
  ## aligning -- either sequential or parallel
  ## -----------------------------------------
  if ( length(spec) > 0 ) {
    cpus <- x@params@cpus
    if ( length(spec) < cpus | !x@params@parallel ){
      lapply(spec, alignSpecies, megProj = x)
    } else {
      sfInit(parallel = TRUE, cpus = cpus, 
             type = x@params@cluster.type)
      sfLibrary("megaptera", character.only = TRUE)
      megProj <- x
      sfExport("spec", "megProj", "acc.tab", 
               "max.bp", "align.exe", "logfile")
      sfLapply(x = spec, fun = alignSpecies, megProj = megProj)
      sfStop()
    }
  } 
  dbDisconnect(conn)

# and my session info
> session_info()
Session info ---------------------------------------------------------
 setting  value                       
 version  R version 3.3.2 (2016-10-31)
 system   x86_64, mingw32             
 ui       RStudio (1.0.136)           
 language (EN)                        
 collate  English_United States.1252  
 tz       America/Chicago             
 date     2017-03-22 
Packages -------------------------------------------------------------
 package       * version  date       source  
ips           * 0.0-10   2017-03-21 Github (heibl/ips@165b251)
megaptera     * 1.0-52   2017-03-21 Github (heibl/megaptera@8beac0d)
...
ADD REPLY
0
Entering edit mode

The easiest hack to your problem is to delete the github ips package and install from CRAN directly (using install.packages())

ADD REPLY

Login before adding your answer.

Traffic: 2583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6