Question: Is there a perfect WGS dataset?
gravatar for marongiu.luigi
22 days ago by
Germany, Mannheim, UMM
marongiu.luigi190 wrote:


I was wondering whether there is an Illumina WGS dataset (fatsq files) that is quality control flaw-less, something that does not have yellow of red flags in any of the FASTQC fields, no tile errors or kmers, for instance but shows the typical decreased of quality values at the extremities. And if yes, does it have a SRA number or a website where can I download it?

Thank you

sequence next-gen • 168 views
ADD COMMENTlink modified 22 days ago by Pierre Lindenbaum109k • written 22 days ago by marongiu.luigi190

What are you actually looking for? What is your objective with this dataset, if it exists?

I suppose you could take any reasonably good dataset and apply some filters to get that "perfect" dataset.

That said, the colours in fastqc are just indications and do not determine conclusively if a dataset is appropriate for a biological question.

ADD REPLYlink written 22 days ago by WouterDeCoster30k

I just need a didactical dataset that shows how fastqc works, but all the datasets I have available show some defect of some sort. I still haven't found one that does not have any flag raised. I need to produce a figure like that one sees in the manual, but without a good dataset, I cannot. I could trim, but in that case, it would not be an original set...

ADD REPLYlink modified 22 days ago • written 22 days ago by marongiu.luigi190

I even got a dataset reported from a manual (I won't say which one) and the base quality figure is as good as in the book; what they did not show was the associated summary: enter image description here

also, this was an exome analysis, not WGS...

ADD REPLYlink modified 22 days ago • written 22 days ago by marongiu.luigi190

I'm not even sure what would be required to have every single fastqc flag pass. I've never seen a green tick for Kmer content.

ADD REPLYlink written 22 days ago by jrj.healey4.7k

FastQC has a file called limits.txt that you will find in the configuration folder in FastQC distribution. If the red X's bother you that much feel free to edit and change the intervals in this file (that throw those red X warnings) so everything becomes green.

As others have said there are no perfect datasets. It is important to keep the context of the experiment in mind as you look at FastQC results. Use the results as a guide to decide if you should do anything additional to the data (e.g. trim, normalize etc) or just proceed with your usual analysis workflow. You will know a really bad dataset (that you should discard) when you find it.

ADD REPLYlink modified 22 days ago • written 22 days ago by genomax51k
gravatar for Pierre Lindenbaum
22 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum109k wrote:

see Illumina Platinium Genomes :

ADD COMMENTlink written 22 days ago by Pierre Lindenbaum109k

thank you, that would be the perfect answer but the access is restricted: from the website is reported

Extramural Investigators must be permanent employees of their institution at a level equivalent to a tenure-track professor or senior scientist with responsibilities that most likely include laboratory administration and oversight. Laboratory staff and trainees such as graduate students and postdoctoral fellows are not permitted to submit project requests.

the problem is, I am not a permanent employee, just a postdoc

ADD REPLYlink modified 22 days ago • written 22 days ago by marongiu.luigi190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 734 users visited in the last hour