Question: FastQC report explanation on example
0
gravatar for dominik.saburo
3.2 years ago by
Poland/Lodz/ICZMP
dominik.saburo0 wrote:

Hi there!

I have tried to make some quality control of NGS fastq files in FastQC. I've read the manual and explanation of warning and failure reasons but I do not know if my data is in summary good or bad. Probably it's bad but please take a look at this screens. Meybe someone will have some idea why the data looks that way.

enter image description here

enter image description here

enter image description here

ADD COMMENTlink modified 3.2 years ago by mastal5112.0k • written 3.2 years ago by dominik.saburo0
0
gravatar for genomax
3.2 years ago by
genomax76k
United States
genomax76k wrote:

This looks like NextSeq data. Having a few red "X" show up on FastQC does not indicate bad data. You should consider them "things to keep in mind" as you proceed with further analysis.

What kind of a dataset is this?

I suggest that you take a look at several blog posts by Dr. Simon Andrews at this link. They should prove useful and may answer some of your questions/doubts.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by genomax76k

Saw similar things on NextSeq, perhaps OP could try trimming the polyG tail (as those might also get high-quality scores...)

ADD REPLYlink written 3.2 years ago by WouterDeCoster42k

This is a fastq file generated by Illumina Miniseq on Truseq amplicon kit. Of course we are analysing human DNA.

ADD REPLYlink written 3.2 years ago by dominik.saburo0
2

If these are amplicons then the duplication observation (plot) is not unexpected. The strange GC plot probably can also be explained by that as well. If on-board MiniSeq analysis package has done all the analysis and things look reasonable then you can move on with other analysis.

ADD REPLYlink written 3.2 years ago by genomax76k

I think that the MiniSeq uses the same 2-colour chemistry as NextSeq.

ADD REPLYlink written 3.2 years ago by WouterDeCoster42k
0
gravatar for mastal511
3.2 years ago by
mastal5112.0k
mastal5112.0k wrote:

The Per Sequence GC content plot doesn't look very good if your data is from a single species, but it might improve after trimming if you have lots of adapter sequences in the data. It all depends what kind of experiment your data is from, as well.

ADD COMMENTlink written 3.2 years ago by mastal5112.0k

We are dealing with human DNA in case of Osteogenesis Imperfecta fenotype. All of fastq files are generated by Illumina Miniseq and some of bioinformatics procedures are made by Local Manager software (for example generating fastq, mapping and indexing, call variants). All of those options were deafault.

ADD REPLYlink written 3.2 years ago by dominik.saburo0

If these are amplicons then that might explain the GC plot, because you have many copies of some regions of the genome, rather than the whole genome.

ADD REPLYlink written 3.2 years ago by mastal5112.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1376 users visited in the last hour