Question: Can DiscoSNP++ deal with hybrid input of both paired-end and single-end reads?
0
gravatar for emeline.a.favreau
2.6 years ago by
London
emeline.a.favreau30 wrote:

Hello,

I would like to use DiscoSNP++ to obtain variants from paired-end reads of a sample. The fastq file of this specific sample has been trimmed from adapters, and in some cases the paired-end reads have been merged due to overlapping sequences. The merging tool (Adapter Removal) produced three files: the merged reads (equivalent to single-end reads), and the remaining R1 and R2 reads that did not overlap.

My question is: can I run a single analysis with DiscoSNP++ provinding the three types of files [option 1]? Or shall I run two analyses: one with the single-end reads (R1, R2), and the other with the pair-end reads (merged reads) [option 2]?

Option 1:

  • fof.txt:
    • fof_merged.txt
    • fof_R1R2.txt
  • fof_merged.txt:
    • collapsed.fq.gz
  • fof_R1R2.txt:
    • pair1.truncated.fq.gz
    • pair2.truncated.fq.gz

Option 2:

First run

  • fof.txt:
    • fof_R1R2.txt
  • fof_R1R2.txt:
    • pair1.truncated.fq.gz
    • pair2.truncated.fq.gz

Second run

  • fof.txt:
    • collapsed.fq.gz

Thank you,

Emeline

ADD COMMENTlink modified 2.6 years ago by pierre.peterlongo860 • written 2.6 years ago by emeline.a.favreau30
1
gravatar for pierre.peterlongo
2.6 years ago by
France
pierre.peterlongo860 wrote:

Hi Emeline,

The answer depends on what you want to use for calling variants and what you want to use for allele frequency computation.

If you want to considere all your reads as a single set (no differentiation at all):

  • fof_root.txt:
    • fof.txt
  • fof.txt:
    • collapsed.fq.gz
    • pair1.truncated.fq.gz
    • pair2.truncated.fq.gz

In this situation you'll call variants from all reads considered as a single set, and coverage will be computed considering all reads as belonging to a single set.

With you option 1 you'll call variants from all reads considered as a single set but coverage computing will separate collapsed.fq.gz reads from the non collapsed ones.

With you option 2 variants are called separating the two read sets.

Best, Pierre

ADD COMMENTlink written 2.6 years ago by pierre.peterlongo860
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1709 users visited in the last hour
_