Calculating average contig length, excluding Ns
1
1
Entering edit mode
20 months ago
diversitree ▴ 10

I have a fasta file of assembled contigs that have Ns in them plus IAPUC codes. I want to calculate the average contig length in the fasta file, excluding Ns.

here is a snippet of the fasta file:

>uce-4216_species1 |uce-4216
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTGGTTCTGAGATGCCTGGCATTCAGGATGATTTGTAATGTAAATTATATAATTGTACTTTCACATATTTTAACATCAAATAGAATGATTGACTACAGAATTTGAGCTGTCTACAGGTGGGGGTCAATTATCATCTGAATAATCACACTGCCACACAAGAATAGCATGGCCATGGAGTGTGACATATTTTTATCTCTATGCATTTCAATGAAGTCAGCCTGGTACATAAAAGGTTATCACCTAGGAAACATATTTTCCTAAGCACAAGTTAAACATGCAAGCAAGATCAGCATAGATATTCAATTTAGCCAGTCAACCCTAACCTATTAATATTTTAACAAAATCCAGTGAGGATAATTTTTTTCTTTGATCCCATCTCATTTGAGCAGCCTGGAAAGGGAAGAAAAATTAAAAACAAAATAGTCAAGCATACAGAATGAGGTTATGTATTAAGTGGGCTATTTAATGTTTTTGGCATATTATAGCCCTAGGGAAAGTGTGGATGGATTTAACAATCAAGATCTGTGTTCCCTGGGCCCACAAAGTTCGAGAAACATAAAATAATCTATACTTCCGAGCTGACAAATCTTACCTGACACACTGCTTCATCTCACTGGGACTCTCTAGCTCAGCATTAATCATATGTTACAGGGAGTAAAAAGAAAAGTAAATCACACTAAGCTGGAATG
>uce-4175_species1 |uce-4175
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGTGAAATTAAGCTTGCTATTTTTCTGTCACACATATAGATACAGTCAAAGCTTGTTTGATGATAAGAACTCAGTTCAGGATCTCATCTTTTGCCCTTGGCTTTAACTTATGTATGCCTTTGTCTGTATTGTCTGTACTTGTCTGTACTGCAAACACTTATGCATGTTTCTGCTATTATATATAGACTAAATATGTCATAACACATGAATGCAAAAGGATCAAAAATGCCTTCCTACTTTATAATCTGCTCAGCCAGAAACAGACTCTGTTTCTACCCTGCCTTTTCCTACATGTCATATTATCATCAGCTGCTCTTATATCCCAAAAGAATACTAACTACTGATCGATTGCCYGGACATGTCTGGCCGTGGCCTACATGTGCCCCGGGTAGTTCATTTATTTGCCACGGTGGATTTGCTAGAGTGGAATTTAATCAATAGCTAATTCATTAATTCTGGTCCTCGAGTATATAGGGATTGTGCAGTATAAAAATGACTGGCTGGCTTCAGCTTTGATTGAAATATGACAAACACTGCAGCTGACAGCCTTGGCAGTTGCCAGGCTGAATATGAATTTTGCTTATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTGTTGCTGCAGAGATTAGAAAAGTTTAAAGAAACCTTTGGGTTGTTTTGCCA
>uce-1234_species1 |uce-1234
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTCNCAGAGGGGAATGCAGAATTGAGAAGTATCATTCCTAAACCTTGACAACCTTTCATCAGGCAAGTGATAAGGAATTGCATGTGTGGAGTACAGGCAGGTTCTCTTCTTGCTGTAGAGGATTTCCACCAGTGCCTGTGTTCTTGTTGAGTAATAATGAGATACCATAATGAAACAGTAGAAGATGGTTCTCCAATAACATTAGGAAAAAAGCAGCTGATGTATGGGATATTGAAGGCAGATTATGTTGTTAATGTATATTAGTATATTCTTAATTTCCTTTTAATTGAAAAAGACATATTGACTTTAATTAAAATCATTTCACAGGAACTGTCAATTAGCACATGTCAAACTAGTTAATTCAGAACAGAATTCTTTTAATTAGGGTCTGCTTTCCTTTAACTGTGGGGCCAATGAAATCAGCCTTTCCTTATCAAGACTTTAAATGTCTCTAAGAAATACAATACAAATCTCTAAAAACTCTTATCTATTATTAGAATCCCATATGGATAACATTAAAATRRTSKTKCTKSMAWKSYWYMWKYYWCMKKWWTYWKWKYYWMMTWKMYRKMWAKKWRRMWMWRAWWMMKGWR
>uce-2732_species1 |uce-2732
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGAATTTAGTTTTCTGCTACTGATCTGTTTGTACATGATCTCTCCCTTTCCCCCCCTCCCTTTCTCTTTCCCCCTCTCTCTCGCTCTCTTTCATATTTAATGTCTGCGTATNNNNNNNNNNNGGATGTACTTTGTCATTTTAAGGTAATTGCGATTTCTCTCAGAATAMMRRSMWMRSMTYANTNMTAANSYKAGSCTGGTATGTGGCTAACTGAAATGCAAAAGGAAGAAGAGGCTTTTTTTTTTTTTTAAGGGGTGGGGGAGAGTTAATTTCCACATTGACATTTTGGAGATACAAATGCAGAGCAAAATCCTTGGGGGGGGGTGTNNNNNNNNNNNNNNNNNNNNNNNNNGATTGGTAATTTTCTTTTTGGTGGACTGCGCAATAGGTATGGTAATTTTAAAAGAGGGTGATTTATATGAGCTTCAGTAAATGCTGCATATTGTATTTCAAAGAGTTTCCTGTCGTGACCTCATAAAAAGAGGAGGAGGCTTGTATGTGTTGCAGTGCCTAGTATATGTCGATTTTGTTGCATCGTTGGGCAGCAGCGCTGTAAGAAGGAATGTCAGCTTTTACATAACGCTCTTTTTGCTTTTGACTCTGTGAGGGGCTGTAAGGGTCCATCTTTGTGATCACAGATGGAGTGGAATGGCTTGAAAATGGTAAGTGAACGGGGAGAGCCTGCTCGTGGGGTTTTGTCTCG
sequence fasta • 639 views
ADD COMMENT
1
Entering edit mode
20 months ago
Jeremy ▴ 910

Here's one solution:

grep -v '^>' file.fasta | sed 's/N//g' | awk '{print length}' | awk '{sum += $0; n++}END{if(n>0) print sum/n;}' 

This code removes the header lines, removes Ns, calculates the length of each line, and then takes the average.

ADD COMMENT

Login before adding your answer.

Traffic: 2050 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6