Question: Demultiplex of V4 reads
0
gravatar for agata88
23 months ago by
agata88790
Poland
agata88790 wrote:

Hi all!

I have reads in R1 and R2 files, and I1 file including 12nc indexes (also fastq). In order to divide reads into separate files, I've performed merging step.

I used join_paired_ends.py program with following command:

join_paired_end.py -f <R1> -r <R2> -o demultiplexed/ -m fasts-join -b <I1> -p 15

It ended up with almost 94% of joined reads. Next, i've performed split_libraries_fastq.py script. When I looked at the histogram the number of reads per sequence length was very diverse:

Length  Count
249.0   14545440
.
.
.
489.0   1467

The amplification was performed for V4 region which length is around 291 nc, the sequencing was 2x250bp. So my question is, how come I have 1467 reads with almost 500 length? Is this a contamination?

Should I discard all read longer than 300bp for further analysis? What do you think about it?

Thanks in advance! Best, Agata

demultiplex miseq 16s • 429 views
ADD COMMENTlink modified 23 months ago by h.mon29k • written 23 months ago by agata88790
0
gravatar for h.mon
23 months ago by
h.mon29k
Brazil
h.mon29k wrote:

You have 0.01% of 489 bp reads, compared to only 249 bp reads. If you consider the interval from, say, 249-333 bp, this percentage will probably be even smaller. This is pretty minimal, and some contamination / artifacts is usual for next generation sequencing.

You could provide more stringent parameters for join_paired_end.py. You used the tool defaults, which is to use fastq-join, and fastq-join default minimum overlap is 6 bases, if I am not mistaken.

ADD COMMENTlink written 23 months ago by h.mon29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 736 users visited in the last hour