Question: Demultiplex of V4 reads
0
gravatar for agata88
11 months ago by
agata88770
Poland
agata88770 wrote:

Hi all!

I have reads in R1 and R2 files, and I1 file including 12nc indexes (also fastq). In order to divide reads into separate files, I've performed merging step.

I used join_paired_ends.py program with following command:

join_paired_end.py -f <R1> -r <R2> -o demultiplexed/ -m fasts-join -b <I1> -p 15

It ended up with almost 94% of joined reads. Next, i've performed split_libraries_fastq.py script. When I looked at the histogram the number of reads per sequence length was very diverse:

Length  Count
249.0   14545440
.
.
.
489.0   1467

The amplification was performed for V4 region which length is around 291 nc, the sequencing was 2x250bp. So my question is, how come I have 1467 reads with almost 500 length? Is this a contamination?

Should I discard all read longer than 300bp for further analysis? What do you think about it?

Thanks in advance! Best, Agata

demultiplex miseq 16s • 242 views
ADD COMMENTlink modified 11 months ago by h.mon25k • written 11 months ago by agata88770
0
gravatar for h.mon
11 months ago by
h.mon25k
Brazil
h.mon25k wrote:

You have 0.01% of 489 bp reads, compared to only 249 bp reads. If you consider the interval from, say, 249-333 bp, this percentage will probably be even smaller. This is pretty minimal, and some contamination / artifacts is usual for next generation sequencing.

You could provide more stringent parameters for join_paired_end.py. You used the tool defaults, which is to use fastq-join, and fastq-join default minimum overlap is 6 bases, if I am not mistaken.

ADD COMMENTlink written 11 months ago by h.mon25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1808 users visited in the last hour