NGS data preprocessing from NextSeq
1
0
Entering edit mode
6.5 years ago
NB ▴ 960

Hello,

I have been working on target sequencing (~100 genes) that were generated with HiSeq (96 samples, each on 2 lanes) sequencer. After converting bcl to fastq, the analysis was followed by adaptor trimming and and alignment. The coverage was always approx 99% at 30x.

We are now moving to NextSeq (48 samples, each on 4 lanes, 150bp) and I understand there is a quite a bit of difference between the two sequencers, one of them being a lot of background noise can be generated.

My question is, are there any relevant preprocessing steps to include or points to keep in mind before adaptor trimming and alignment for data being generated from NextSeq ?

Thank you

data preprocessing nextSeq • 2.1k views
ADD COMMENT
0
Entering edit mode

I never heard of platform-specific data processing (at least within the Illumina empire). Imho, it is also not advisable, as you may introduce unwanted biases.

quite a bit of difference between the two sequencers, one of them being a lot of background noise can be generated

Can you reference that?

ADD REPLY
1
Entering edit mode

there are many papers but this is just one of them I have on me right away https://www.nature.com/articles/srep43169?WT.feed_name=subjects_sequencing

ADD REPLY
7
Entering edit mode
6.5 years ago

The only NextSeq-specific issue is the presence of optical duplicates at the edge of tiles. This is similar to the optical duplicate issue on HiSeq 3000/4000/X (presumably NovaSeq too) and you can use clumpify from bbmap in all cases. The clumpify settings that I have our pipeline use are:

dupesubs=0 qin=33 markduplicates=t optical=t -Xmx30G spany=t adjacent=t dupedist=40

For 3000/4000 runs it's similar:

dupesubs=0 qin=33 markduplicates=t optical=t -Xmx30G dupedist=2500
ADD COMMENT
0
Entering edit mode

Thanks Devon, so this is implemented after marking duplicates ?

ADD REPLY
0
Entering edit mode

This is implemented after demultiplexing.

ADD REPLY
0
Entering edit mode

BTW, NextSeq has an issue with extra Gs. So keep that in mind if you're doing variant calling (not something we typically do, so I don't have specific recommendations there). BSseq has similar issues on the NextSeq, so we don't use that platform for such samples.

ADD REPLY

Login before adding your answer.

Traffic: 2041 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6