Question: Is there a perfect WGS dataset?
gravatar for marongiu.luigi
8 months ago by
Germany, Mannheim, UMM
marongiu.luigi380 wrote:


I was wondering whether there is an Illumina WGS dataset (fatsq files) that is quality control flaw-less, something that does not have yellow of red flags in any of the FASTQC fields, no tile errors or kmers, for instance but shows the typical decreased of quality values at the extremities. And if yes, does it have a SRA number or a website where can I download it?

Thank you

sequence next-gen • 312 views
ADD COMMENTlink modified 8 months ago by Pierre Lindenbaum117k • written 8 months ago by marongiu.luigi380

What are you actually looking for? What is your objective with this dataset, if it exists?

I suppose you could take any reasonably good dataset and apply some filters to get that "perfect" dataset.

That said, the colours in fastqc are just indications and do not determine conclusively if a dataset is appropriate for a biological question.

ADD REPLYlink written 8 months ago by WouterDeCoster37k

I just need a didactical dataset that shows how fastqc works, but all the datasets I have available show some defect of some sort. I still haven't found one that does not have any flag raised. I need to produce a figure like that one sees in the manual, but without a good dataset, I cannot. I could trim, but in that case, it would not be an original set...

ADD REPLYlink modified 8 months ago • written 8 months ago by marongiu.luigi380

I even got a dataset reported from a manual (I won't say which one) and the base quality figure is as good as in the book; what they did not show was the associated summary: enter image description here

also, this was an exome analysis, not WGS...

ADD REPLYlink modified 8 months ago • written 8 months ago by marongiu.luigi380

I'm not even sure what would be required to have every single fastqc flag pass. I've never seen a green tick for Kmer content.

ADD REPLYlink written 8 months ago by jrj.healey11k

FastQC has a file called limits.txt that you will find in the configuration folder in FastQC distribution. If the red X's bother you that much feel free to edit and change the intervals in this file (that throw those red X warnings) so everything becomes green.

As others have said there are no perfect datasets. It is important to keep the context of the experiment in mind as you look at FastQC results. Use the results as a guide to decide if you should do anything additional to the data (e.g. trim, normalize etc) or just proceed with your usual analysis workflow. You will know a really bad dataset (that you should discard) when you find it.

ADD REPLYlink modified 8 months ago • written 8 months ago by genomax63k
gravatar for Pierre Lindenbaum
8 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum117k wrote:

see Illumina Platinium Genomes :

ADD COMMENTlink written 8 months ago by Pierre Lindenbaum117k

thank you, that would be the perfect answer but the access is restricted: from the website is reported

Extramural Investigators must be permanent employees of their institution at a level equivalent to a tenure-track professor or senior scientist with responsibilities that most likely include laboratory administration and oversight. Laboratory staff and trainees such as graduate students and postdoctoral fellows are not permitted to submit project requests.

the problem is, I am not a permanent employee, just a postdoc

ADD REPLYlink modified 8 months ago • written 8 months ago by marongiu.luigi380
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2060 users visited in the last hour