Hi all! Please help. I parsed sequences from GenBank, renamed it and saved as a fasta file.
>KP821216.1_Bluetongue v_Cameroon_Jan-1982 ATGGCTGCTCAGAATGAGCAACGTCCGGAGCGAATAAAAACGACACCGTATTTAGAGGGA GATGTGCTTTCGAGTGATTCAGGACCGCTGCTTTCCGTGTTCGCGCTGCAAGAAATAATG
The last 4 characters is a year when the viruse was isolated. Now I need to select the only records that are in the some range (for example 1958-1990):
from Bio import SeqIO output_file = open("range_date_select.txt", "w") date_from = 1958 date_to = 1990 count = 0 for i, record in enumerate(SeqIO.parse("Bluetong_batch_cds.txt", "fasta")): a = record.description[-4:] if date_from <= int(a) <= date_to: SeqIO.write(record, output_file, "fasta") count = count + 1 print(count) output_file.close()
Further the task becomes more complicated: I need not more 4 records for the year. If its number is more, 4 records should be chosen randomly.
Can anybody help me how to do this? Thanks in advance.