Hi! I'm an master thesis student that tries to learn this:) I've searched this forum but could really find anything that answers my question.
I want to extract specific chromosome sequences from the human genome to a new fasta file and I'm using this python script:
sequences = SeqIO.parse(GRCh38.p13.genome.fa, 'fasta') with open(new_fasta.fa', 'w+') as fa: for line in sequences: if line.id == chr: fa.write(">" + strline.id) + "\n") fa.write(str(line.seq)+ "\n\n") break # I use break here, otherwise it will continue with chr10,12,13 etc. if i choose chr1
It works, but chr1 will take like 1 sec to extract and chr20 will take 20. How can I speed this up?
truly greatful for help!