From the GAAS toolkit you can use a simple perl script:
gaas_fasta_statistics.pl -f genome.fa
You get that kind of output:
There are 1879 sequences There are 980 sequences > 10kb There are 1877 sequences > 1kb There are 78770088 nucleotides, of which 11276 are Ns There are 2196 N-regions (possibly links between contigs) There are 0 pure (only) N sequences. Assembler doing that must be notified ! There are 0 sequences that begin or end with Ns (see problem_sequences.txt) The GC-content is 31.4% (not counting Ns 31.4%) There are 1878 sequences with lowercase nucleotides (Ns not considered) There are 26012854 lowercase nucleotides (Ns not considered) The N50 is 121247 The N90 is 25683 The N50 for sequences over 1000bp is 121247 The N50 for sequeces over 10000bp is 128954
I'm a big fan of stats.sh which is from Brian Bushnells' bbtools package, installable via bioconda.
Useful for long read FASTQ length assessments and assembly contig and scaffold stats.
stats.sh Written by Brian Bushnell Last modified December 7, 2017 Description: Generates basic assembly statistics such as scaffold count, N50, L50, GC content, gap percent, etc. For multiple files, please use statswrapper.sh. Works with fasta and fastq only (gzipped is fine). Please read bbmap/docs/guides/StatsGuide.txt for more information. Usage: stats.sh in=<file> Parameters: in=file Specify the input fasta file, or stdin. gc=file Writes ACGTN content per scaffold to a file. gchist=file Filename to output scaffold gc content histogram. shist=file Filename to output cumulative scaffold length histogram. gcbins=200 Number of bins for gc histogram. n=10 Number of contiguous Ns to signify a break between contigs. k=13 Estimate memory usage of BBMap with this kmer length. minscaf=0 Ignore scaffolds shorter than this. phs=f (printheaderstats) Set to true to print total size of headers. n90=t (printn90) Print the N/L90 metrics. extended=f Print additional metrics such as N/L90 and log sum. logoffset=1000 Minimum length for calculating log sum. logbase=2 Log base for calculating log sum. pdl=f (printduplicatelines) Set to true to print lines in the scaffold size table where the counts did not change. n_=t This flag will prefix the terms 'contigs' and 'scaffolds' with 'n_' in formats 3-6. addname=f Adds a column for input file name, for formats 3-6. format=<0-7> Format of the stats information; default 1. format=0 prints no assembly stats. format=1 uses variable units like MB and KB, and is designed for compatibility with existing tools. format=2 uses only whole numbers of bases, with no commas in numbers, and is designed for machine parsing. format=3 outputs stats in 2 rows of tab-delimited columns: a header row and a data row. format=4 is like 3 but with scaffold data only. format=5 is like 3 but with contig data only. format=6 is like 3 but the header starts with a #. format=7 is like 1 but only prints contig info. gcformat=<0-4> Select GC output format; default 1. gcformat=0: (no base content info printed) gcformat=1: name length A C G T N GC gcformat=2: name GC gcformat=4: name length GC Note that in gcformat 1, A+C+G+T=1 even when N is nonzero. Please contact Brian Bushnell at firstname.lastname@example.org if you encounter any problems.