Entering edit mode
21 months ago
FadyNabil ▴ 10
I want to make a clustalw alignment using biopython on a fasta file that have many reads converted from fastq format to fasta format (the reads have different lengths). when I run clustalw on this file I got this error:
Traceback (most recent call last): File "F:\CIT656\pythonProjects\CIT656_Spring21\Project_Script.py", line 33, in <module> stdout, stderr = clustalw_cline() File "F:\CIT656\pythonProjects\CIT656_Spring21\venv\lib\site-packages\Bio\Application\__init__.py", line 574, in __call__ raise ApplicationError(return_code, str(self), stdout_str, stderr_str) Bio.Application.ApplicationError: Non-zero return code 1 from '"C:/Program Files (x86)/ClustalW2/Clustalw2.exe" -infile="E:\\Courses\\Bioinformatics Diploma\\Programming to bioinformatics\\Project\\CIT656-project\\Balkans\\Balkans_reads_extracts\\merged_raeds\\merged_fasta_file\\merged_reads.fasta"', message 'There was an exception in the PearsonFileParser::getSeqRange function.'
Could you share the code calling the clustal, and the converted fasta file? From the error message it seams that there might be an issue with the fasta file format. You can troubleshoot this be creating small fasta file (e.g. 10 sequences), and try run clustal on web, from cli, from your code to verify that it works. Then go through the actual data to see what is different there.
Btw. how big is the file?
It's 8.36 GB
It does not make sense to run clustalW on fastq reads converted to fasta format of this size. You were offered this advice in a prior thread (how to align many single end fastq files from the same study on each other without using any reference genome using python? ). It is still applicable.