Question: DNA sequences trimming methods
1
gravatar for l.souza
20 months ago by
l.souza60
Brasilia, Brazil
l.souza60 wrote:

Hello,

This is my situtation:

I have about 2000 DNA sequences to process, but I just want to work with the coding region of them. I have the coordinates of all CDSs (that I got with Prodigal) in a file with this format:

DEFINITION  seqnum=1;seqlen=8075;seqhdr="KU821590.1 Foot-and-mouth disease virus - type SAT 1 isolate SAT1/NAM01/2010, complete 
genome";version=Prodigal.v2.6.3;run_type=Single;model="Ab initio";gc_cont=53.37;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS                      1026..8045

/note="ID=1_1;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.537;conf=99.99;score=1639.83;cscore=1612.89;sscore=26.93;rscore=-13.40;uscore=34.66;tscore=5.68;"

How could I extract the sequence file that corresponds to the coordinates into a FASTA file?

dna sequence trimming cds • 653 views
ADD COMMENTlink modified 20 months ago • written 20 months ago by l.souza60

You'll need to parse out the header and coordinate information from your file, then match to the headers in your fasta, and use the coordinates per header to cut each sequence.

Can you post a few more lines of your file from Prodigal?

ADD REPLYlink written 20 months ago by st.ph.n2.4k
DEFINITION  seqnum=1;seqlen=8075;seqhdr="KU821590.1 Foot-and-mouth disease virus - type SAT 1 isolate SAT1/NAM01/2010, complete 
genome";version=Prodigal.v2.6.3;run_type=Single;model="Ab initio";gc_cont=53.37;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS             1026..8045
                 /note="ID=1_1;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.537;conf=99.99;score=1639.83;cscore=1612.89;sscore=26.93;rscore=-13.40;uscore=34.66;tscore=5.68;"

DEFINITION  seqnum=2;seqlen=8010;seqhdr="KR108948.1 Foot-and-mouth disease virus - type SAT 1 isolate KNP/196/91/1 polyprotein gene, partial 
cds";version=Prodigal.v2.6.3;run_type=Single;model="Ab initio";gc_cont=53.37;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS             1011..>8009
                 /note="ID=2_1;partial=01;start_type=ATG;rbs_motif=TTTA;rbs_spacer=14bp;gc_cont=0.537;conf=99.99;score=1624.62;cscore=1579.25;sscore=45.37;rscore=16.14;uscore=23.55;tscore=5.68;"

DEFINITION  seqnum=3;seqlen=8144;seqhdr="JF749860.1 Foot-and-mouth disease virus - type SAT 1 isolate KEN_004/2002, complete 
genome";version=Prodigal.v2.6.3;run_type=Single;model="Ab initio";gc_cont=53.37;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS             1018..8037
                 /note="ID=3_1;partial=00;start_type=ATG;rbs_motif=AAA;rbs_spacer=14bp;gc_cont=0.540;conf=99.99;score=1468.42;cscore=1472.82;sscore=-4.40;rscore=0.64;uscore=-10.72;tscore=5.68;"

DEFINITION  seqnum=4;seqlen=8156;seqhdr="KM268899.1 Foot-and-mouth disease virus - type SAT 1 isolate TAN/22/2012, complete 
genome";version=Prodigal.v2.6.3;run_type=Single;model="Ab initio";gc_cont=53.37;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS             1006..8025
                 /note="ID=4_1;partial=00;start_type=ATG;rbs_motif=TTTTA;rbs_spacer=14bp;gc_cont=0.537;conf=99.99;score=1462.32;cscore=1401.95;sscore=60.38;rscore=17.40;uscore=37.30;tscore=5.68;"

The file consists of repetitions like this...

ADD REPLYlink modified 20 months ago • written 20 months ago by l.souza60

Is this genbank format? You can convert it to bed (see some discussion here) and get the regions of interest with bedtools or bedops.

ADD REPLYlink written 20 months ago by h.mon23k

Not all of my sequences are genebank format!

ADD REPLYlink written 20 months ago by l.souza60

What is the output format you chose for prodigal? Do you have a mix of formats?

ADD REPLYlink written 20 months ago by h.mon23k
1
gravatar for l.souza
20 months ago by
l.souza60
Brasilia, Brazil
l.souza60 wrote:

I could solve my problem calling ' -d ' in PRODIGAL parametres.

ADD COMMENTlink written 20 months ago by l.souza60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1314 users visited in the last hour