I need to BLAST 75k sequences against refseg_genomic. I set up an Amazon Server (c3.8xlarge). The instance is running and I uploaded all files of the refseq_genomic database to the /blast/blastdb_custom folder and my sequences in TXT files in the folder /home.
Then I run the following code on my local machine:
import paramiko from openpyxl import Workbook from openpyxl import load_workbook workbook = load_workbook("Sequences.xlsx") worksheet = workbook.get_sheet_by_name(name = "Sequences") num_rows = worksheet.max_row i = 2 k = 1 # Connect to the AWS and BLAST ssh = paramiko.SSHClient() ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) ssh.connect(hostname = "XXX.amazonaws.com", port = 22, username = "ubuntu", key_filename = "AWS_KEY.pem", compress = False) while i < num_rows + 1: SequenceID = worksheet.cell(row = i, column = k).value Command = "blastn -query " + str(SequenceID) + ".txt -db refseq_genomic -evalue 1 -word_size 11 -gapopen 5 -gapextend 2 -penalty -3 -reward 2 -max_target_seqs 100 -num_threads 32 -outfmt 5 -out RefSeq-" + str(SequenceID) + ".xml" print(Command) # Execute command on AWS ssh_stdin, ssh_stdout, ssh_stderr = ssh.exec_command(str(Command)) ssh_stdout.readlines() i = i+1 print("Finished")
Unfortunately, only the first command in the loop is started but then the computation runs for hours and whether an output is calculated (although the output file is created) nor is there an error message. I think with the computational power BLASTing a sequence should not take longer than a few seconds. Probably something went wrong but I don't know what.
I appreciate any hints what I have to change to get the BLAST running.