Entering edit mode
4.5 years ago
Sillpositive
▴
20
Hello everyone
I have a fasta file containing all sequences of my genome.
I only want to extract the sequences of chromosomes 1 to 33 but also MT W and Z chromosomes. However, I would like to change the names when saving them in another file such as (I just put the header but of course I want the sequence after header):
INPUT:
>MT dna:chromosome chromosome:GRCg6a:MT:1:16775:1 REF
ATCGTTTTTTT...
>W dna:chromosome chromosome:GRCg6a:W:1:6813114:1 REF
>Z dna:chromosome chromosome:GRCg6a:Z:1:82529921:1 REF
>1 dna:chromosome chromosome:GRCg6a:1:1:197608386:1 REF
>2 dna:chromosome chromosome:GRCg6a:2:1:149682049:1 REF
>3 dna:chromosome chromosome:GRCg6a:3:1:110838418:1 REF
>4 dna:chromosome chromosome:GRCg6a:4:1:91315245:1 REF
....
And OUTPUT that I WANT :
>chr MT
>chr W
>chr Z
>chr 1
>chr 2
>chr 3
>chr 4
Thank you for your answer
This might help
Thank you for your help man ! Thus I wrote a script in Biopython and solve the problem !
Sorry for the generic answer, but the question sound like a "write-a-script-for-me" request XD of course it is like 5-lines script