Question

Can DiscoSNP++ deal with hybrid input of both paired-end and single-end reads?

0

Entering edit mode

5.9 years ago

emeline.a.favreau ▴ 30

Hello,

I would like to use DiscoSNP++ to obtain variants from paired-end reads of a sample. The fastq file of this specific sample has been trimmed from adapters, and in some cases the paired-end reads have been merged due to overlapping sequences. The merging tool (Adapter Removal) produced three files: the merged reads (equivalent to single-end reads), and the remaining R1 and R2 reads that did not overlap.

My question is: can I run a single analysis with DiscoSNP++ provinding the three types of files [option 1]? Or shall I run two analyses: one with the single-end reads (R1, R2), and the other with the pair-end reads (merged reads) [option 2]?

Option 1:

fof.txt:
- fof_merged.txt
- fof_R1R2.txt
fof_merged.txt:
- collapsed.fq.gz
fof_R1R2.txt:
- pair1.truncated.fq.gz
- pair2.truncated.fq.gz

Option 2:

First run

fof.txt:
- fof_R1R2.txt
fof_R1R2.txt:
- pair1.truncated.fq.gz
- pair2.truncated.fq.gz

Second run

fof.txt:
- collapsed.fq.gz

Thank you,

Emeline

snp discosnp++ variant calling discosnp • 1.2k views

ADD COMMENT • link updated 5.9 years ago by pierre.peterlongo ▴ 890 • written 5.9 years ago by emeline.a.favreau ▴ 30

score 1 · Answer 1 · 2018-06-14

Hi Emeline,

The answer depends on what you want to use for calling variants and what you want to use for allele frequency computation.

If you want to considere all your reads as a single set (no differentiation at all):

fof_root.txt:
- fof.txt
fof.txt:
- collapsed.fq.gz
- pair1.truncated.fq.gz
- pair2.truncated.fq.gz

In this situation you'll call variants from all reads considered as a single set, and coverage will be computed considering all reads as belonging to a single set.

With you option 1 you'll call variants from all reads considered as a single set but coverage computing will separate collapsed.fq.gz reads from the non collapsed ones.

With you option 2 variants are called separating the two read sets.

Best, Pierre