Split one embl file into several
2
0
Entering edit mode
6.3 years ago
thom_otis • 0

Hello!

I understand that my problem is simple, but I can't solve it. Who can help me to write a script that split one embl file into several so that each sequence is kept in a separate file (the identifier of the end sequence of each sequences is \)?

Part of embl file:

ID   comp0_c0_seq1; SV 1; linear; unassigned DNA; STD; UNC; 205 BP.
XX
DE   len=205 path=[1:0-135 1445:136-204]
XX
SQ   Sequence 205 BP; 64 A; 54 C; 31 G; 56 T; 0 other;
     GTATTGAACT GCAGAGCATT AAATGCTGCA ACTCAGTGCT TAGAATTCAT TAGATTCAGA        60
     GCAACGAACC CTAAATACTG AGCTGTCCCA TTAAATACTC TGCAGTTCAA TACTTAGCAT       120
     TCACCATTAA ACATAACACT TCCCGAGTTT CCACCATCCA TAAACAGCAG GCATTGTAAC       180
     CTGTAGGCTC TCTCCACGGT TACCT                                             205
//
ID   comp0_c0_seq2; SV 1; linear; unassigned DNA; STD; UNC; 205 BP.
XX
DE   len=205 path=[4094:0-135 1445:136-204]
XX
SQ   Sequence 205 BP; 59 A; 50 C; 35 G; 61 T; 0 other;
     AGAGTATTAA ATGTTGCAGT TCAGTGCTTA AAATTTATTG GATTCAGAGA ATCTTCAAAT        60
     TCAACGGACC CTAAACACTG AGCTGTCGCA TTAAATGCTC TGCAGTTCAA TGCTTAGCTT       120
     TCACCATTAA GCATAGCACT TCCCGAGTTT CCACCATCCA TAAACAGCAG GCATTGTAAC       180
     CTGTAGGCTC TCTCCACGGT TACCT                                             205
//
ID   comp1_c0_seq1; SV 1; linear; unassigned DNA; STD; UNC; 244 BP.
XX
DE   len=244 path=[3:0-88 875:89-243]
XX
SQ   Sequence 244 BP; 71 A; 51 C; 63 G; 59 T; 0 other;
     GCAGAATTTA AGGCTATGAA TCAGGAGGTT CATAATTCCT TAAGGAGGGG AGTATGATGC        60
     GGAGCATCCA CGCTCACCTC CACTCCACCG CATTGTCTTC GAGCTGTGAC AGCCAGCGCA       120
     TAATATTCAA GAGCTATTGA CAGGTGTTGA AACGCGGCAG CCTTGCATAC TATTGAAGGA       180
     CCACGTTTCA TTATTGTGAT CTATAAGAAG ACAGCTGATG CGATCATGAG GAAGGAAGAA       240
     GGCT                                                                    244
//
script python perl • 1.6k views
ADD COMMENT
0
Entering edit mode
6.3 years ago
thom_otis • 0

Thanks, I'm have solved this problem with SEQRETSPLIT

ADD COMMENT
0
Entering edit mode

Please do not close a post unless it was posted by mistake (and is irrelevant to the site). To mark a question solved, accept relevant answer(s).

ADD REPLY
0
Entering edit mode

Accepting as answer to stop question being bumped by Biostars bot.

ADD REPLY
0
Entering edit mode
6.3 years ago
Ram 34k

Workflow to implement (in BioPython/BioPerl):

  1. Create Bio.Seq.IO to read GenBank format
  2. Use a loop to read the input file record by record (records are of type Bio.Seq)
  3. In the loop, use a Bio.Seq.IO object to write the record out in GenBank format, with the ID extracted from the above Bio.Seq object to name the output file

You're all set.

ADD COMMENT
0
Entering edit mode

Sorry, but I'm don't know how to programĀ 

ADD REPLY
0
Entering edit mode

Well, you'll have to start somewhere. This is a relatively simple task and will serve as a great starting point.

ADD REPLY

Login before adding your answer.

Traffic: 1908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6