Question: N in fastq data
gravatar for analytical
9 months ago by
analytical0 wrote:

Hi I recently got 3 sample WGS for snp analysis from same genome I seen my fastq file it has N . So how should i tackle this if i do not trim this and take it for mapping onto my reference genome will the aligner ignore this N while mapping??

Total number of bases   45191688940
Number of base N    884912

also the total number of bases in all 3 sample is different ? How is this possible when sequencing was done for same genome of different samples

s1_R1 s1_R2 s2_R1 s2_R2 s3_R1 s3_R2
45191688940 45191688940 43709052900 43709052900 53171402300 53171402300

N fasta • 416 views
ADD COMMENTlink modified 9 months ago by Chris Miller20k • written 9 months ago by analytical0
gravatar for Chris Miller
9 months ago by
Chris Miller20k
Washington University in St. Louis, MO
Chris Miller20k wrote:

I think you have some fundamental misunderstandings about how fastq files and sequencing work. You can have any number of short reads from a genome stored in a fastq file. That has no relationship to how large your target genome is. And yes, aligners will generally handle Ns appropriately.

ADD COMMENTlink written 9 months ago by Chris Miller20k

So what do you mean by total number of bases.?

ADD REPLYlink written 9 months ago by analytical0

each fastq line has a length (for example, 100 bp). multiply that by the number of fastq entries and that's how many bases of sequence you have. Only after you align the data do you have the ability to talk about sequence coverage across the genome.

Seriously, though - There are many resources explaining the sequencing and alignment process. I recommend that you seek out and read some of them so that you understand this before proceeding.

ADD REPLYlink written 9 months ago by Chris Miller20k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1947 users visited in the last hour