Question: DNA sequences trimming methods
1
gravatar for l.souza
2.1 years ago by
l.souza70
Brasilia, Brazil
l.souza70 wrote:

Hello,

This is my situtation:

I have about 2000 DNA sequences to process, but I just want to work with the coding region of them. I have the coordinates of all CDSs (that I got with Prodigal) in a file with this format:

DEFINITION  seqnum=1;seqlen=8075;seqhdr="KU821590.1 Foot-and-mouth disease virus - type SAT 1 isolate SAT1/NAM01/2010, complete 
genome";version=Prodigal.v2.6.3;run_type=Single;model="Ab initio";gc_cont=53.37;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS                      1026..8045

/note="ID=1_1;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.537;conf=99.99;score=1639.83;cscore=1612.89;sscore=26.93;rscore=-13.40;uscore=34.66;tscore=5.68;"

How could I extract the sequence file that corresponds to the coordinates into a FASTA file?

dna sequence trimming cds • 778 views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by l.souza70

You'll need to parse out the header and coordinate information from your file, then match to the headers in your fasta, and use the coordinates per header to cut each sequence.

Can you post a few more lines of your file from Prodigal?

ADD REPLYlink written 2.1 years ago by st.ph.n2.5k
DEFINITION  seqnum=1;seqlen=8075;seqhdr="KU821590.1 Foot-and-mouth disease virus - type SAT 1 isolate SAT1/NAM01/2010, complete 
genome";version=Prodigal.v2.6.3;run_type=Single;model="Ab initio";gc_cont=53.37;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS             1026..8045
                 /note="ID=1_1;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.537;conf=99.99;score=1639.83;cscore=1612.89;sscore=26.93;rscore=-13.40;uscore=34.66;tscore=5.68;"

DEFINITION  seqnum=2;seqlen=8010;seqhdr="KR108948.1 Foot-and-mouth disease virus - type SAT 1 isolate KNP/196/91/1 polyprotein gene, partial 
cds";version=Prodigal.v2.6.3;run_type=Single;model="Ab initio";gc_cont=53.37;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS             1011..>8009
                 /note="ID=2_1;partial=01;start_type=ATG;rbs_motif=TTTA;rbs_spacer=14bp;gc_cont=0.537;conf=99.99;score=1624.62;cscore=1579.25;sscore=45.37;rscore=16.14;uscore=23.55;tscore=5.68;"

DEFINITION  seqnum=3;seqlen=8144;seqhdr="JF749860.1 Foot-and-mouth disease virus - type SAT 1 isolate KEN_004/2002, complete 
genome";version=Prodigal.v2.6.3;run_type=Single;model="Ab initio";gc_cont=53.37;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS             1018..8037
                 /note="ID=3_1;partial=00;start_type=ATG;rbs_motif=AAA;rbs_spacer=14bp;gc_cont=0.540;conf=99.99;score=1468.42;cscore=1472.82;sscore=-4.40;rscore=0.64;uscore=-10.72;tscore=5.68;"

DEFINITION  seqnum=4;seqlen=8156;seqhdr="KM268899.1 Foot-and-mouth disease virus - type SAT 1 isolate TAN/22/2012, complete 
genome";version=Prodigal.v2.6.3;run_type=Single;model="Ab initio";gc_cont=53.37;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS             1006..8025
                 /note="ID=4_1;partial=00;start_type=ATG;rbs_motif=TTTTA;rbs_spacer=14bp;gc_cont=0.537;conf=99.99;score=1462.32;cscore=1401.95;sscore=60.38;rscore=17.40;uscore=37.30;tscore=5.68;"

The file consists of repetitions like this...

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by l.souza70

Is this genbank format? You can convert it to bed (see some discussion here) and get the regions of interest with bedtools or bedops.

ADD REPLYlink written 2.1 years ago by h.mon26k

Not all of my sequences are genebank format!

ADD REPLYlink written 2.1 years ago by l.souza70

What is the output format you chose for prodigal? Do you have a mix of formats?

ADD REPLYlink written 2.1 years ago by h.mon26k
1
gravatar for l.souza
2.1 years ago by
l.souza70
Brasilia, Brazil
l.souza70 wrote:

I could solve my problem calling ' -d ' in PRODIGAL parametres.

ADD COMMENTlink written 2.1 years ago by l.souza70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1705 users visited in the last hour