Entering edit mode
7.1 years ago
ajsakla
•
0
I have a fasta file of primer sequences and I have a csv file with the sequence ids of primer sequence pairs that I need to parse out into their own fasta files. What I need to do is extract each pair of sequences from the fasta and put them into their own fasta files.
So my files look something like this:
My fasta file of primer sequences:
>01
ACGTACGT
>02
ACGTACGT
>03
ACGTACGT
>04
ACGTACGT
My CSV list of primer pairs:
column1,column2
01,03
02,04
Based on this example data set, what I need to end up with is 2 fasta files of the pairs of fasta sequences i.e., File1.fa will contain sequences 01 and 03 and File2.fa will contain sequences 02 and 04.
Thanks in advance!
Thanks. I'm having some trouble getting this to run.
But I get this error:
Traceback (most recent call last): File "./parse.sh", line 11, in <module> for line in f: ValueError: I/O operation on closed file
This is python code, not bash.
mv extract.sh extract.py
, run aspython extract.py
I still get the same error:
Traceback (most recent call last): File "parse.py", line 11, in <module> for line in f: ValueError: I/O operation on closed file
I edited the error.
So now I get this error:
Traceback (most recent call last): File "extract.py", line 18, in <module> with open('file' + str(pairs.index(p)) + '.fasta') as out: IOError: [Errno 2] No such file or directory: 'file0.fasta'
It looks like it's looking for the file or directory file0.fasta so when I make an empty file0.fasta I get this error:
Traceback (most recent call last): File "extract.py", line 19, in <module> out.write('>' + p[0] + '\n' + primer[p[0]]) KeyError: 'ccRepeat-46449'
Should be fixed now.
I'm getting a similar error as before. Key error is the id of the first sequence in the first pair:
Traceback (most recent call last): File "extract.py", line 18, in <module> file.write('>' + p[0] + '\n' + primer[p[0]]) KeyError: 'ccRepeat-46449'
If it helps, here's a sampling of my primer pairs list.
It's just occurred to me that python may not be reading this file as csv formatted. Could that be an issue?
this line is separating the ids by ',' into a tuple:
The error you're getting is that the keys in the dictionary (headers from your fasta), are not matching the primers. Double check that the headers match the primers, and if there is anything else in the header, this needs parsed out. The script was based on your OP example, so there may be further things to consider where the script needs to be modified. If you're unable to troubleshoot on your own, it's best to find another method, ideally by your own testing and come back with errors. Biostars is not a coding service.
I realize now I should provide exact examples of the data I'm working with. I was able to get your script to work with my data which saved me many hours of parsing thousands of pairs manually. Thank you for your time.