I have a scaffold with N`s inside but I want to split it into separated contigs. The first reason is because I have N's and the other is because I have non-IUPAC characters into my sequence. Just trying I split by N's and eliminated the sequences smaller than 1.
Any suggestion using SeqIO or any other tool.
my temporary output.fasta using this:
sed -i.bak 's/N/\n>N\n/g' myfile.fasta >N agtagatgatgatagatgatgatga >N >N >N >N >N tgttgcatgctagctagctagtcgatcgatcgatcgtagctagca >N >N >N tcgatcgatgtagctagctgacaNctagtcgatgca
Further I eliminate the NULL sequences or filter >500 to obtain a reasonable set of sequences.
>N agtagatgatgatagatgatgatga >N tgttgcatgctagctagctagtcgatcgatcgatcgtagctagca >N tcgatcgatgtagctagctgacaNctagtcgatgca
The problem you can imagine. All sequences have the same names.
How to enumerate them in this order?