1
0
Entering edit mode
4 months ago

What do I need to change from my looped unix code to run the perl batch download reference proteomes code from https://www.uniprot.org/help/api_downloading ? I am getting a "zsh: parse error near done'".

I am also not sure if anything needs to be changed with the perl code or if using the TextEdit application for saving the perl code as .pl was the right approach.

I will be downloading from a list with thousands of taxids in the future. This is a small example I have tried with no success:

cat > taxids.txt

226186

345219

FILE=taxids.txt

perl apidownload.pl $line done <$FILE

2
Entering edit mode
4 months ago

Minimally invasive fix: replace : with ;

while read line; do

perl apidownload.pl taxonomy:$line done <$FILE


Edit: of course that requires the apidownload script to do the right thing, but simply passing the taxid as a query won't do the right thing. You can imagine that the Perl script does essentially the same as the search field on that page.

So the query should look somewhat like taxonomy:226186

try this query line perl apidownload.pl taxonomy:$line and see if it does the right thing ADD COMMENT 0 Entering edit mode Not quite sure what this response means. Were there 500 results? There should only have been 1 reference. When I search Proteomes on UniProt and type 226186 or 345219, it finds the species. I used the "Download the UniProt reference proteomes for all organisms below a given taxonomy node in compressed FASTA format" Perl example. My output after processing the above code: Failed, got 500 Can't verify SSL peers without knowing which Certificate Authorities to trust for https://www.uniprot.org/proteomes/?query=reference:yes+taxonomy:taxonomy:226186&format=list Failed, got 500 Can't verify SSL peers without knowing which Certificate Authorities to trust for https://www.uniprot.org/proteomes/?query=reference:yes+taxonomy:taxonomy:345219&format=list ADD REPLY 0 Entering edit mode Looks like I need to install Mozilla::CA certificates on the command line: 500 Can't verify SSL peers without knowing which Certificate Authorities to trust . Are these trusted certificates? Does the UniProt Perl example use edirect? If so it looks like I will need an API KEY, and I will need to request to download very high volumes in the future close to 5,000 and not 10. This is where API KEY is set up to gain certificates: https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/ ADD REPLY 0 Entering edit mode Sorry yes you need to provide the full query URL. Try: perl apidownload.pl "https://www.uniprot.org/proteomes/?query=reference:yes+taxonomy:226186" and install whatever perl modules are required. ADD REPLY 0 Entering edit mode After downloading certificates with sudo cpan Mozilla::CA the follow script was successful: FILE=taxids.txt while read line; do mkdir ./${line} perl apidownload.pl $line > ./${line} done <$FILE However, my output from running the pl script does not end up in the ./${line} folder . What would I need to change? Modifying the perl script like so did not do the trick:

my $OutputDir = './ARGV[1]'; for my$proteome (split(/\n/, $response_list->content)) { my$file = ./$ARGV[1]$proteome . '.fasta.gz';

1
Entering edit mode

How about changing the shell script like this:

while read line; do
mkdir -p $line # the directory could already exist cd$line
cd ..
done <$FILE  ADD REPLY 0 Entering edit mode Thank you! This helped me arrive at a solution. The full query, "https://www.uniprot.org/proteomes/?query=reference:yes+taxonomy:${line}", did not work for the search and returned "Failed, got 400 Bad Request".

However, using $line as the argument was sufficient to obtain the reference proteomes and only returned "Redundant argument in sprintf", referring to the perl code. For future reference, the following bash code can be used to create individual folders with their corresponding taxonomy name (may contain spaces) or taxid & download the reference proteome/s from UniProt directly to those folders. cat > taxonomyandtaxids.txt 226186 345219 Dubosiella newyorkensis Bacteroides thetaiotaomicron FILE=taxonomyandtaxids.txt while read line; do FNAM="${line/ /_}"
mkdir -p $FNAM cd$FNAM
perl /pathtoyourperlcode/apidownload.pl $line cd .. done <$FILE
`

For a full tutorial on how to perform this task, please visit https://github.com/kostrouc/Bioinformatics_Tutorials/