Question

NGS data preprocessing from NextSeq

0

Entering edit mode

6.5 years ago

NB ▴ 960

Hello,

I have been working on target sequencing (~100 genes) that were generated with HiSeq (96 samples, each on 2 lanes) sequencer. After converting bcl to fastq, the analysis was followed by adaptor trimming and and alignment. The coverage was always approx 99% at 30x.

We are now moving to NextSeq (48 samples, each on 4 lanes, 150bp) and I understand there is a quite a bit of difference between the two sequencers, one of them being a lot of background noise can be generated.

My question is, are there any relevant preprocessing steps to include or points to keep in mind before adaptor trimming and alignment for data being generated from NextSeq ?

Thank you

data preprocessing nextSeq • 2.1k views

ADD COMMENT • link updated 6.5 years ago by Devon Ryan 104k • written 6.5 years ago by NB ▴ 960

0

Entering edit mode

I never heard of platform-specific data processing (at least within the Illumina empire). Imho, it is also not advisable, as you may introduce unwanted biases.

quite a bit of difference between the two sequencers, one of them being a lot of background noise can be generated

Can you reference that?

ADD REPLY • link 6.5 years ago by ATpoint 82k

1

Entering edit mode

there are many papers but this is just one of them I have on me right away https://www.nature.com/articles/srep43169?WT.feed_name=subjects_sequencing

ADD REPLY • link 6.5 years ago by NB ▴ 960

score 7 · Accepted Answer · 2017-10-10

7

Entering edit mode

6.5 years ago

Devon Ryan 104k

The only NextSeq-specific issue is the presence of optical duplicates at the edge of tiles. This is similar to the optical duplicate issue on HiSeq 3000/4000/X (presumably NovaSeq too) and you can use clumpify from bbmap in all cases. The clumpify settings that I have our pipeline use are:

dupesubs=0 qin=33 markduplicates=t optical=t -Xmx30G spany=t adjacent=t dupedist=40

For 3000/4000 runs it's similar:

dupesubs=0 qin=33 markduplicates=t optical=t -Xmx30G dupedist=2500

ADD COMMENT • link 6.5 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Devon, so this is implemented after marking duplicates ?

ADD REPLY • link 6.5 years ago by NB ▴ 960

0

Entering edit mode

This is implemented after demultiplexing.

ADD REPLY • link 6.5 years ago by Devon Ryan 104k

0

Entering edit mode

BTW, NextSeq has an issue with extra Gs. So keep that in mind if you're doing variant calling (not something we typically do, so I don't have specific recommendations there). BSseq has similar issues on the NextSeq, so we don't use that platform for such samples.

ADD REPLY • link 6.5 years ago by Devon Ryan 104k