Question: Join mapped, overlapping, paired-end reads
gravatar for Mr. Dave
3.7 years ago by
Mr. Dave50
United States
Mr. Dave50 wrote:

I'd like to combine my paired-end reads that have already been mapped by a PE reference aligner. Is there an existing tool for stitching overlapping, mapped paired-end reads (presumably from SAM to SAM)?

I've taken a look at COPE, PEAR, and FLASH, but it seems that none of these will merge a SAM/BAM. I'm looking at ABySS right now, but I'm not confident that any of these are built for merging non-FASTQ input.

It seems like my only options are either to 1.) stitch my FASTQ pairs prior to mapping or 2.) parse the SAM fields to do the stitching myself. I'd like to work with a validated alignment pipeline, so I'd rather not switch the pipeline from paired-end to single, stitched reads.

ADD COMMENTlink modified 3.7 years ago by mark.rose30 • written 3.7 years ago by Mr. Dave50

Most read merging programs expect fastq files as input since people generally merge reads before aligning etc. You can always convert your BAM back to fastq and then do the read merging. You do know for sure that these reads overlap?

ADD REPLYlink written 3.7 years ago by genomax78k

All but the shortest reads overlap. I think extracting the reads of interest from the BAM, converting to FASTQ, and merging will be the most straightforward. I was hoping to rely on the SAM's alignment to merge the reads, but in all reality the self-aligned merge is going to be fine for the regions I've targeted.

ADD REPLYlink written 3.7 years ago by Mr. Dave50

Why would you like to do that? Maybe there are other ways to reach your goal

ADD REPLYlink written 3.7 years ago by shaun80
gravatar for mark.rose
3.7 years ago by
United States
mark.rose30 wrote:

Are you looking to merge only reads from a pair or are you looking to derive a consensus from many aligned read pairs?

If the latter (I'm not sure what you would want the former for) there is always the old standby

samtools mpileup -uf reference.fa aligment.bam | bcftools view -cg - | vcfutils vcf2fq
ADD COMMENTlink written 3.7 years ago by mark.rose30

Just from the pair, unfortunately, but thank you for the mpileup suggestion.

I'd like to do a follow-up alignment for only a subset of reads based on their mapping positions. The initial PE alignment is fairly conventional but follows a validated workflow that I won't be changing. My follow-up alignment works best with SE reads, I'm currently aligning the pairs as SE reads, but I think my results will be much better if I join them.

If I'm not able merge pairs within a BAM, I guess I'll extract the subset of reads in the SAM by position, convert them to fasta (or extract by QNAME from fastq), join them by self/reference alignment, and then start the follow-up alignment. My hope was that stitching pairs within SAM and then exporting to fasta would have been simpler.

ADD REPLYlink written 3.7 years ago by Mr. Dave50

In case your reads are non-overlapping BBMap has an option to use a reference to merge them. You can find that thread here. You would have to start with fastq files though.

ADD REPLYlink written 3.7 years ago by genomax78k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2107 users visited in the last hour