Entering edit mode
7.6 years ago
dmitri.ivanovsky
•
0
Hi all! Please help. I parsed sequences from GenBank, renamed it and saved as a fasta file.
>KP821216.1_Bluetongue v_Cameroon_Jan-1982
ATGGCTGCTCAGAATGAGCAACGTCCGGAGCGAATAAAAACGACACCGTATTTAGAGGGA
GATGTGCTTTCGAGTGATTCAGGACCGCTGCTTTCCGTGTTCGCGCTGCAAGAAATAATG
The last 4 characters is a year when the viruse was isolated. Now I need to select the only records that are in the some range (for example 1958-1990):
from Bio import SeqIO
output_file = open("range_date_select.txt", "w")
date_from = 1958
date_to = 1990
count = 0
for i, record in enumerate(SeqIO.parse("Bluetong_batch_cds.txt", "fasta")):
a = record.description[-4:]
if date_from <= int(a) <= date_to:
SeqIO.write(record, output_file, "fasta")
count = count + 1
print(count)
output_file.close()
Further the task becomes more complicated: I need not more 4 records for the year. If its number is more, 4 records should be chosen randomly.
Can anybody help me how to do this? Thanks in advance.
Thanks a lot! It works very well!
But could you please explain why this works:
If there are more than 4 records in the list "d[year]" , it shouldn't be recorded because the condition "if i < 4" is not met? But its are written down. I'm a newbie in python so I know this is probably a very basic question.
Yes, you are right. In the first
forloop I iterate over each year. Then, I shuffle the listd[year]to make sure you have a random order of sequences for that year. At this point,d[year]contains all sequences for a given year (there may be more than 4 sequences). In the secondforloop I iterate over each sequence record in thed[year]list and counting them - as iteration goes - from 0 to numer of sequences in d[year] list (so the variableiis just a counter). For first sequenceiis 0, for secondiis 1, and so on. So thisif i < 4statement means that only first four sequences ind[year]will be saved in output file. Nothing will be done with fifth (i= 4), sixth (i= 5), nth sequence in the list. If you are satisfied with my answer, please mark it as accepted.