How to divide lots of nucleotide sequences into complete cds and partial in a given fasta file?
1
0
Entering edit mode
8.9 years ago
seta ★ 1.9k

Hi all,

I've downloaded lots of nucleotide sequences from NCBI, now I would like to divide them into two separate files, partial and complete cds. Also, there is some nucleotide sequences that have not been determined as either complete cds or partial sequence within my nucleotide sequences. Please share any your commands or script to do this. Sorry, if you find the question is so basic. Thanks

alignment blast sequence • 2.3k views
ADD COMMENT
0
Entering edit mode

Do you mean splitting the sequences based on the information on the header lines (fasta format)? Just be aware that not all sequences downloaded from NCBI will have that information on header. If this is not you wanted, then there is no way to say if the sequence is complete or partial, unless you align it to the reference sequences.

ADD REPLY
0
Entering edit mode

Yeah, that's right. I plan to split them based on fasta header, please let me know your approach to do it?. You're right, unfortunately as I also mentioned in my post some sequences have not such information in the header. I have not reference sequences to do it.

ADD REPLY
0
Entering edit mode
8.9 years ago
arnstrm ★ 1.8k

Use bioawk

bioawk -c fastx ' $name ~ /YOUR_SEARCH_TERM/ {print ">"$name; print $seq}' INPUT_FILE.FASTA
ADD COMMENT
0
Entering edit mode

Thanks so much for your prompt feedback. I''ll try it

ADD REPLY

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6