I have a fasta file (seq.fasta) containing multiple sequences;
>seq1 ATGCGTCTCCCCTTTAGAGAGTTCTCTCTAGCTACGTA ATTTTTATCGCGCGGGGTGCGACGTTTTTAGGGGGGGG >seq2 ATCTCTNNNNNNNNNNATATCCCCTTTNNNNNCTCTCT ATTTTTTTTTCCCCCCGCGCGCGATCGACGCCCCCCCC >seq3 ATCTCTNNNNNNNNNNATATCCCCTTCTCGGGGCCCCT NNNNNTTTTTCTCTCTCGCGCTCGTCGAAAAATGCCCC
How to count the frequency of 'N' and the number of positions this pattern has been occurring? (ATCTCT "NNNNNNNNNN" ATATCCCCTTT "NNNNN" CTCTCT).
The result should be No. of occurrences of 'N' and number of positions this pattern has been seen per sequence
Output seq1,0,0 seq2,15,2 seq3,15,2 ($id=seq1, No_of_N's=0, frequency_pattern=0 $id=seq2, No_of_N's=15, frequency_pattern=2 $id=seq3, No_of_N's=15, frequency_pattern=2)