Question: N in fastq data
3 months ago by
analytical0 wrote:

Hi I recently got 3 sample WGS for snp analysis from same genome I seen my fastq file it has N . So how should i tackle this if i do not trim this and take it for mapping onto my reference genome will the aligner ignore this N while mapping??

Total number of bases   45191688940
Number of base N    884912

also the total number of bases in all 3 sample is different ? How is this possible when sequencing was done for same genome of different samples

s1_R1 s1_R2 s2_R1 s2_R2 s3_R1 s3_R2
45191688940 45191688940 43709052900 43709052900 53171402300 53171402300

ADD COMMENTlink modified 3 months ago by Chris Miller19k • written 3 months ago by analytical0
3 months ago by
Chris Miller19k
Washington University in St. Louis, MO
Chris Miller19k wrote:

I think you have some fundamental misunderstandings about how fastq files and sequencing work. You can have any number of short reads from a genome stored in a fastq file. That has no relationship to how large your target genome is. And yes, aligners will generally handle Ns appropriately.

Chris Miller19k wrote:

So what do you mean by total number of bases.?

analytical0

each fastq line has a length (for example, 100 bp). multiply that by the number of fastq entries and that's how many bases of sequence you have. Only after you align the data do you have the ability to talk about sequence coverage across the genome.

Seriously, though - There are many resources explaining the sequencing and alignment process. I recommend that you seek out and read some of them so that you understand this before proceeding.

Chris Miller19k
