Question: Number of reads in Fastq and Bowtie
0
gravatar for yl2019
4 weeks ago by
yl201910
yl201910 wrote:

Hi guys, When I check the number of reads in my fastq (paired end library), I get 4071779 paired reads.

I do this using "wc -l fwd_paired.fastq" and then dividing the number by 4. or just by checking the FASTQC report basic statistics.

When I try to align these reads using Bowtie, the first line of the summary indicates i only have 2761561 paired reads

2761561 reads; of these:
  2761561 (100.00%) were paired; of these:
    2682638 (97.14%) aligned concordantly 0 times
    6280 (0.23%) aligned concordantly exactly 1 time
    72643 (2.63%) aligned concordantly >1 times
    ----
    2682638 pairs aligned concordantly 0 times; of these:
      121 (0.00%) aligned discordantly 1 time
    ----
    2682517 pairs aligned 0 times concordantly or discordantly; of these:
      5365034 mates make up the pairs; of these:
        4990100 (93.01%) aligned 0 times
        17463 (0.33%) aligned exactly 1 time
        357471 (6.66%) aligned >1 times
9.65% overall alignment rate

Does anyone know why bowtie is reporting 2761561 when I have 4071779? I also get warning messages with the bowtie run that look like this:

Warning: skipping mate #2 of read 'MN00234:32:000H2LGW3:1:23103:22530:9439 2:N:0:GCCAAT' because length (1) <= # seed mismatches (0) Warning: skipping mate #2 of read 'MN00234:32:000H2LGW3:1:23103:22530:9439 2:N:0:GCCAAT' because it was < 2 characters long

but there are only 100 of these.

any ideas?

Thanks a lot.

sequencing bowtie fastq genome • 124 views
ADD COMMENTlink modified 4 weeks ago by h.mon24k • written 4 weeks ago by yl201910

Did you trim the fastqs? If yes, how did you trim them?

ADD REPLYlink written 4 weeks ago by h.mon24k

Yes, I am using trimmomatic, code below:

java -jar $EBROOTTRIMMOMATIC/trimmomatic-0.36.jar PE -phred33 $dir/$R1 $dir/$R2 fwd_paired.fastq fwd_unpaired.fastq rev_paired.fastq rev_unpaired.fastq LEADING:3 TRAILING:3 SLIDINGWINDOW:4:25
ADD REPLYlink written 4 weeks ago by yl201910
3
gravatar for h.mon
4 weeks ago by
h.mon24k
Brazil
h.mon24k wrote:

You are not using MINLEN with Trimmomatic, so you end up with really short (or even zero length) sequences, and bowtie is discarding the pairs where at least of of the sequences is shorter than the seed length.

In addition, unless you have a specific reason to hard-trim 3 bases from beginning and end of sequences, I would advise you to avoid doing so.

ADD COMMENTlink written 4 weeks ago by h.mon24k

That makes sense, thanks a lot for the explanation! I will look into adding a MINLEN. Is there a recommended value for de novo genome assemblies? I'm thinking of starting off with 3 since the seed length default for bowtie appears to be 2

ADD REPLYlink written 4 weeks ago by yl201910
1

A MINLEN of 3 is useless, it would map to hundreds or thousands of locations. The minimum I use depends on the project, quality of reads, sequencing depth, and other factors, but as a rule of thumb, I will never go bellow 35bp - unless, of course, if working with miRNA. See this discussion:

How to choose Trimmomatic's parameter 'MINLEN '?

ADD REPLYlink written 4 weeks ago by h.mon24k

Thanks so much for the information & the link!

ADD REPLYlink written 4 weeks ago by yl201910
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1113 users visited in the last hour