How can I download seqs from CAZY database?
1
0
Entering edit mode
3.9 years ago
claudia.d • 0

I'm trying to download all GH29 sequences from CAZY database. It was easy manually for the archaea (just 41 seqs), but the bacterial are more than 4k. How can I do that? My goal is to get all the sequences, calculate a tree and studying gene annotation. I also read about dbCAN2, but I'm not sure I understood at all how it works. Can anyone help me ?

sequence • 1.3k views
ADD COMMENT
0
Entering edit mode
3.9 years ago
GenoMax 141k

Download this file from dbCAN2 here. This link was provided by an answer found here: Download CAZy database

Once you download the file, pull out the sequences for GH29 family using the following code (fasta linearization code by @Pierre):

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' < CAZyDB.07312019.fa | grep -A 1 GH29  --no-group-separator | tr "\t" "\n" > GH29_seq.fa

If you want them nicely folded every 60 characters:

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' < CAZyDB.07312019.fa | grep -A 1 GH29  --no-group-separator | tr "\t" "\n" | fold -w 60 > GH29_seq.fa
ADD COMMENT

Login before adding your answer.

Traffic: 2518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6