This is my first post to Biostars, so I apologize in advance if I make any mistake. I am also a n00b to bioinformatics, so please bear with me if I can't express my question(s) correctly.
Well, I have a fastq file and I want to write an R function to get quality scores for each sequence and summarize them using a boxplot, such that the distribution of sequencing quality scores can be visualized.
My problem is that I don't know how to do that? I know about Fastq format and that the 2nd line contains the sequence and that the 4th line contains the quality ASCII format. I also have the conversion table from an ascii character to the quality score, but my problem as I said is that I don't understand the question (at least the first part) and how to plot the summarizing.
What I could do till now is that I could format the file so I kept every pther fourth line, and them used awk (or grep) to find the frequency of each character in the file; but this would produce a histogram distribution not a boxplot.
I will appreciate your help.