4.5 years ago by
Kennedy Krieger Institute (Baltimore, MD)
Yes, you can use the FASTQ files you download using SRA Toolkit (or FASTQ files you get from any source) for a variety of purposes such as assembly or alignment to a reference genome. Many people consider it a good idea to assess the quality of the reads in your file, and FastQC is a very popular tool for doing this. In the NCBI NOW workshop we show you how to use FastQC (first in Galaxy then on the command line).
In your question you showed the results of the basic statistics output of FastQC. I find that's helpful to see the sequence length (or if there are multiple lengths the shortest and longest are reported) and GC content. Sequences flagged to be filtered and removed by Illumina's Casava are reported in the "Sequences flagged as poor quality" field so that doesn't reflect analysis by a FastQC module and often isn't too informative.
You ask about the plot of quality scores across all bases. The image you link to shows a typical box and whisker plot. (The red line is the median value, the yellow box spans the inter-quartile range (25-75%), the upper and lower whiskers correspond to the 10% and 90% points, and the blue line is the mean quality score.)
In your example the boxes all occur in the "green zone" of excellent quality scores (on the y-axis, range 28-34). The main idea of such plots is that they show the position along each read on the x-axis, and typically the quality drops as one approaches the end of the read. This reflects the inherent limitation of the sequencing chemistry in which calling bases gets more difficult later in reads. For examples of very good and very bad reads see a tutorial here.
If the last position(s) in your reads are of dramatically lower quality then you may wish to trim them. Various programs are available to do this. You can find some in Galaxy in the Tools panel in the section "NGS: QC and manipulation". These are derived from FASTX-Toolkit, a collection of tools for FASTA and FASTQ preprocessing. You can use FASTX-Toolkit on the command line. For de novo assembly (which is not something I'm experienced in) many people remove low quality reads and trim adapter sequences. I suggest you have a look at the FastQC documentation which discusses its variety of useful analysis module outputs including adapter content.