Question: How to divide lots of nucleotide sequences into complete cds and partial in a given fasta file?
0
gravatar for seta
3.5 years ago by
seta1.0k
Sweden
seta1.0k wrote:

Hi all,

I've downloaded lots of nucleotide sequences from NCBI, now I would like to divide them into two separate files, partial and complete cds. Also, there is some nucleotide sequences that have not been determined as either complete cds or partial sequence within my nucleotide sequences. Please share any your commands or script to do this. Sorry, if you find the question is so basic. Thanks

blast alignment sequence • 1.1k views
ADD COMMENTlink modified 3.5 years ago by arnstrm1.7k • written 3.5 years ago by seta1.0k

Do you mean splitting the sequences based on the information on the header lines (fasta format)? Just be aware that not all sequences downloaded from NCBI will have that information on header. If this is not you wanted, then there is no way to say if the sequence is complete or partial, unless you align it to the reference sequences.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by arnstrm1.7k

Yeah, that's right. I plan to split them based on fasta header, please let me know your approach to do it?. You're right, unfortunately as I also mentioned in my post some sequences have not such information in the header. I have not reference sequences to do it. 

ADD REPLYlink written 3.5 years ago by seta1.0k
0
gravatar for arnstrm
3.5 years ago by
arnstrm1.7k
Ames, IA
arnstrm1.7k wrote:

Use bioawk

bioawk -c fastx ' $name ~ /YOUR_SEARCH_TERM/ {print ">"$name; print $seq}' INPUT_FILE.FASTA

 

 

 

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by arnstrm1.7k

Thanks so much for your prompt feedback. I''ll try it

ADD REPLYlink written 3.5 years ago by seta1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1550 users visited in the last hour