Question: How to merge paired-end reads from sam files?
0
gravatar for aquaq
3.7 years ago by
aquaq30
aquaq30 wrote:

Hi,

I have paired-end read sequencing data. I have aligned reverse and forward reads with bwa mem. Reverse and forward reads are 120 nucleotid long and they cover a 180 nucleotid long part of a genome, hence they overlap.

bwa mem  $REF $file1 $file2 -t 20 > $sam

When I open the sam output file, the first lines begin like this:

M00135:404:HBJFESJSN:2:1101:2016:1297   53      ref   ...
M00135:404:HBJFESJSN:2:1101:2016:1297   133     ref   ...
M00135:404:HBJFESJSN:2:1101:2646:1297   53      ref   ...
M00135:404:HBJFESJSN:2:1101:2646:1297   133     ref   ...

For every pair, I have the two lines aligned to the reference from the two directions ( I know, this is the normal output). Is it possible to combine reverse and forward reads to one sequence, thus getting a 180 nucleotid long alignment for each pair?

Many thanks!

EDIT: sorry for not being clear, I would like to merge pairs after alignment is done.

bwa paired-end seq • 2.0k views
ADD COMMENTlink modified 3.7 years ago by WouterDeCoster44k • written 3.7 years ago by aquaq30
2
gravatar for WouterDeCoster
3.7 years ago by
Belgium
WouterDeCoster44k wrote:

BBMerge can do this :)

ADD COMMENTlink written 3.7 years ago by WouterDeCoster44k

Thanks. I have used pandaseq for this problem as well, but I would like to merge sequences after alignment, not before... I am sorry, I was not clear on this.

ADD REPLYlink written 3.7 years ago by aquaq30
1

I'm not sure what you biological motivation is for this objective, but I'm completely against tampering with alignment data. Which problem are you trying to solve?

ADD REPLYlink written 3.7 years ago by WouterDeCoster44k

It would be just a trial. In a specific part of the sequence that we are interested in, there is a large number of mutations/sequencing error (it was a random sequence, but it was not supposed to be that random). I just wanted to be sure that it is not caused by some weird behaviour of pandaseq that I am not aware of before continuing with further analysis. But I could totally accept if that's unusual, I will find an other way to confirm it (eg by running bbmerge and comparing the results). Thanks for help!

ADD REPLYlink written 3.7 years ago by aquaq30

I would also like to do this, and yes, after alignment, because I am using a downstream application that needs a merged PE format, but the alignments contain < 1% of the total original fastq reads, and it will be much more efficient to merge only the aligned reads. Did you try using aftermerge? How did it go?

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by norah.saarman0
2
gravatar for WouterDeCoster
3.7 years ago by
Belgium
WouterDeCoster44k wrote:

I just saw this tool by chance, but obviously I have no idea how well it works.

ADD COMMENTlink written 3.7 years ago by WouterDeCoster44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1825 users visited in the last hour