Quality report by Fastqc, Result Interpretation and Next step parameters
1
0
Entering edit mode
6.2 years ago
ravi.uhdnis ▴ 180

Hi All,

I ran FastQC software on whole genome sequence (WGS) data of Human sample (with expected coverage 30X), generated from illumina HiSeq platform. It all appears good (green) except : In case  of Forward as well as Reverse Reads

warning (Orange) : Per base sequence content and Per base GC content.

Fail (Red) : K-mer content.

I want to run 'Trimmomatic' for 'Trimming' of poor bases. What should be the parameters of Trimmomatic so that it minimize/remove poor reads and  K-mer error ?.  I want to show my .html pages of fastqc run but didn't find any way on BioStars ?.

Looking forward for responses as i need them because i am new in NGS data analysis field. Thank you.

genome sequencing next-gen • 3.5k views
2
Entering edit mode
6.2 years ago
arnstrm ★ 1.8k

I wouldn't worry about those warnings unless your first plot (Per base sequence quality) is bad. You also need to check Per base sequence content if you have adapter contamination. I think these are the only 2 things that can be fixed with either trimmomatic/fastx trimmer (or any other utilities). Rest, doesn't matter much.

2
Entering edit mode

+1

But also would like to suggest to check read length distribution.

0
Entering edit mode

Thank you for your comment. The length distribution seems all good with upright 'V' on length 101, in between 100 to 102, showing maximum reads are of 101 length. What should be the minimum length of a read that i should kept before going to mapping to reference genome step or i can keep all of them ?.

0
Entering edit mode

Thank you for your comment. 'Per base sequence quality' appears all good (green) in the plot with Q score approx. 32 for forward and approx. 30 for reverse reads, the mean value. There is no adapter contamination as per the fastqc plot although i'll use trimmomatic to remove, if there is any adapter.