Question

NGS Fastqc explanation

0

Entering edit mode

7.8 years ago

GHanumanth404 ▴ 40

Hello Friends !!!!! I am new to biostar community and also in NGS I am facing lot of problem in data analysis of my NGS data Please correct me with following definition. Read length means the number of sequencing cycle is run. Total sequence is the actual length of my genome or target need to be sequenced. reads are bases which are sequenced

if above is correct then in my fastqc file the read length is given as 32-151. if the it means number of cycle then why is giving 32-151

Also can any one explain me fastqc report Per base sequence content Per base sequence content Per sequence quality score Sequence length distribution Kmer content

read lenght FASTQC Coverage • 6.3k views

ADD COMMENT • link 7.8 years ago by GHanumanth404 ▴ 40

0

Entering edit mode

Welcome to Biostars !

Read length - length of the read (DNA fragment) that has been sequenced.
Read length : 32-151 - shortest read length - 32 and longest read length - 151 (BTW which instrument was used to generate the data?)
Fastqc report explained here

ADD REPLY • link 7.8 years ago by venu 7.1k

0

Entering edit mode

If it means lenght of DNA fragment sequenced then what is total sequence. Does Total sequence means DNA + Adapters ?

ADD REPLY • link 7.8 years ago by GHanumanth404 ▴ 40

0

Entering edit mode

I was confused with 'Total sequence', it is actually Total sequences. From the fastqc manual provided above

Total Sequences: A count of the total number of sequences processed. There are two values reported, actual and estimated. At the moment these will always be the same. In the future it may be possible to analyse just a subset of sequences and estimate the total number, to speed up the analysis, but since we have found that problematic sequences are not evenly distributed through a file we have disabled this for now.

So it is the estimate of total number of reads present in your fastq file. Take 4-5 starting letters from a read id(which are same in all read ids), do the following, which gives the total number of reads present

grep -c '^@HWI' foo.fastq

ADD REPLY • link 7.8 years ago by venu 7.1k

0

Entering edit mode

I think it means total number sequences. Each sequence has different length (here sequence length 31-151) or same length (for example sequence length 150). Am i correct?

ADD REPLY • link 7.8 years ago by GHanumanth404 ▴ 40

0

Entering edit mode

and what is the meaning of The overall %GC of all bases in all sequences. %GC means content in entire genome then what is the meaning of all bases in all sequences

ADD REPLY • link 7.8 years ago by GHanumanth404 ▴ 40

0

Entering edit mode

all bases in all sequences refers to bases that are actually present in your sequence file.

That number should match the value for your genome (unless the sampling was non-uniform or you have contamination).

ADD REPLY • link 7.8 years ago by GenoMax 141k

0

Entering edit mode

%GC means GC content in my sample i means sequences. Then here all bases means what?? is it compairing with respect to every bases in every position of my sequence?

ADD REPLY • link 7.8 years ago by GHanumanth404 ▴ 40

0

Entering edit mode

Out of the total bases present (A/C/G/T) in your file %GC is percentage of G/C bases (no consideration for their position/location) .

ADD REPLY • link 7.8 years ago by GenoMax 141k

0

Entering edit mode

Hi,

I have a illumuna MiSeq dataset for a parasite genome. Machine itself gave paired-end reads as two separate datasets. one forward(R1) and other reverse(R2). When using FASTQC tool for one set e.g. filtering reads <70bp in R1 dataset, should we consider R1 as paired-end or no?

Thanks

ADD REPLY • link 7.6 years ago by sumudu_rangika ▴ 50