Question: Demultiplex of V4 reads
11 months ago by
agata88770 wrote:

Hi all!

I have reads in R1 and R2 files, and I1 file including 12nc indexes (also fastq). In order to divide reads into separate files, I've performed merging step.

I used program with following command: -f <R1> -r <R2> -o demultiplexed/ -m fasts-join -b <I1> -p 15

It ended up with almost 94% of joined reads. Next, i've performed script. When I looked at the histogram the number of reads per sequence length was very diverse:

Length  Count
249.0   14545440
489.0   1467

The amplification was performed for V4 region which length is around 291 nc, the sequencing was 2x250bp. So my question is, how come I have 1467 reads with almost 500 length? Is this a contamination?

Should I discard all read longer than 300bp for further analysis? What do you think about it?

Thanks in advance! Best, Agata

demultiplex miseq 16s
11 months ago by
h.mon25k wrote:

You have 0.01% of 489 bp reads, compared to only 249 bp reads. If you consider the interval from, say, 249-333 bp, this percentage will probably be even smaller. This is pretty minimal, and some contamination / artifacts is usual for next generation sequencing.

You could provide more stringent parameters for You used the tool defaults, which is to use fastq-join, and fastq-join default minimum overlap is 6 bases, if I am not mistaken.

