HiCUP pipeline, hicup_digester script, ApoI genome digestion
3
0
Entering edit mode
5.9 years ago
Faylasoof • 0

I am trying the HiCUP pipeline (Babraham, Cambridge, UK). It works very well for all enzymes with sites that have the usual A, G, C and T bases in the recognition site, but fails for an enzyme like ApoI at the genome digestion stage. The recognition sequence for ApoI is R^AATTY and when I use the standard command:

perl hicup_digester mm9_merge.fa -re1 R^AATTY,ApoI *.fa

I get the following error message in the terminal:

Restriction enzyme: 'R^AATTY' should only contain the characters: 'A','G','C','T' or '^' Please change configuration file and/or command-line parameters and/or installation accordingly

This is independent of which version is used. I am using hicup version v0.5.9 but the same error appears in the latest version. BTW, I am using Ubuntu 14.04 operating system on Dell Optiplex 7050.

A solution to this problem would be highly appreciated since the manual for the HiCUP pipeline gives no information when enzymes like ApoI are used, having 'R' & 'Y' in the recognition sequence. I am unfamiliar with the Perl scripting so modifying the hicup_digester perl script to have it work for genome digestion with ApoI is not likely.

genome software error • 1.2k views
ADD COMMENT
0
Entering edit mode
5.9 years ago
ATpoint 82k

Wouldn't it be the easiest to run that program four times with -re1:

AAATTC

AAATTT

GAATTC

GAATTT

R means either A or G and Y is C or T. Merge the output and you're done.

ADD COMMENT
0
Entering edit mode
5.9 years ago
Faylasoof • 0

Yes, I've considered that but it shall be whole genome data, so rather tedious, and hoped instead that some clever solution may exist that uses the R and Y symbols (I know what these mean, of course). A "smarter" solution would be like modifying the Perl script, except I do not know beyond the very basics! Thanks for your reply. Much appreciated!

ADD COMMENT
0
Entering edit mode

Depends on the definiton of 'smarter'. In my experience, the smartest solution is the one that produces the intended result, while consuming as little efford and time as possible, so that you can focus on your actual analysis, but you are of course free to decide what you do ;-D

ADD REPLY
0
Entering edit mode
5.9 years ago
Faylasoof • 0

I understand the argument! Analysis indeed is more important, but I shall be having to this for many genomes; meaning 4 x No. of genomes! Hence my query for a quicker way, if possible!

ADD COMMENT

Login before adding your answer.

Traffic: 1473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6