Question: Demultiplexing before merging paired-end reads?
0
gravatar for asidhu
5 weeks ago by
asidhu10
asidhu10 wrote:

I'm trying to create a pipeline that converts FASTQ --> VCF and it would include the option of single-end and paired-end reads. I am a little bit confused about how to approach this for paired-end reads as they are more complicated. For single-end pairs I've done the following steps:

  • Demultiplex with sabre se
  • Trim adapters with Cutadapt
  • Map reads with BWA
  • Convert .SAM to .BAM using samtools view
  • Sorted alignments using samtools sort
  • Created an index using samtools index
  • Used samtools mpileup to create .BCF
  • Converted .BCF to .VCF using bcftools call

For single-end pairs this has worked well but I am unsure of the process for paired-end reads. I am new to bioinformatics so I am still in the learning process.

For paired-end reads I was told I need to merge the read pairs together before cutting the adapters on each end. Would the process by as follows:

  • Demultiplex with sabre pe
  • Join the forward and reverse pairs using pandaseq
  • Trim the adapters from each side using cutadapt and since with pandaseq a single merged file is created, treat as single-end?
cutadapt sabre pandaseq • 184 views
ADD COMMENTlink modified 4 weeks ago by Gabriel R.2.5k • written 5 weeks ago by asidhu10

It seems pandaseq doesn't know anything about adapters, so you should trim adapters before joining reads - the adapters will be at the end of the reads, and will prevent paired read merging. Also, use Trimmomatic or BBDuk, which trim the reads as pairs.

ADD REPLYlink written 5 weeks ago by h.mon22k

For paired-end reads I was told I need to merge the read pairs together before cutting the adapters on each end.

That will only work if you are sure that all reads are supposed to overlap in the middle (i.e. insert sizes are shorter than sequencing cycles).

ADD REPLYlink written 4 weeks ago by genomax59k
0
gravatar for WouterDeCoster
4 weeks ago by
Belgium
WouterDeCoster35k wrote:

While you didn't specify the end goal of your pipeline, let me give you some feedback.

Don't reinvent the wheel when excellent pipelines already exist

Map reads with BWA
Convert .SAM to .BAM using samtools view
Sorted alignments using samtools sort

These don't have to be separate steps:

bwa mem ref.fa read_R1.fastq.gz read_R2.fastq.gz | samtools sort -o my_sorted_alignment.bam

Saves you time and storage.

Join the forward and reverse pairs using pandaseq

I think you should just leave them paired-end separate files and align them as above - without merging.

ADD COMMENTlink written 4 weeks ago by WouterDeCoster35k
0
gravatar for Gabriel R.
4 weeks ago by
Gabriel R.2.5k
Center for Geogenetik Københavns Universitet
Gabriel R.2.5k wrote:

This post is between a comment and a response. I work a lot with ancient DNA where fragments are very short (e.g. 40bp). This entails that even for a paired-end run you constantly have to deal with "single-end" reads (merged paired-end) and genuine paired-end data. The logic was getting very convoluted and messy. At the Max Planck Institute in Leipzig, we developed the following methodology (starting from fastq but you can easily start from bcl) :

fastq2bam in.fwd.fastq.gz in.rev.fastq.gz  | leeHom  -o /dev/stdout /dev/stdin   | deML -i index -o /dev/stdout  /dev/stdin  |  bwa bam2bam -g [genome] /dev/stdin | samtools sort /dev/stdin outputprefix

This is one line and you have a bam file without any data redundancy.

fastq2bam is found here: https://github.com/grenaud/BCL2BAM2FASTQ/

leeHom is found here: http://grenaud.github.io/leeHom/

deMLis found here: http://grenaud.github.io/deML/

a hacked bwa that eats bam is found here: https://github.com/mpieva/network-aware-bwa

a hacked samtools that sorts by going over the file once is found here: https://github.com/mpieva/samtools-patched

ADD COMMENTlink written 4 weeks ago by Gabriel R.2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1144 users visited in the last hour