Question: PE alignment with bwa aln and conversions
0
gravatar for playerraa
2.8 years ago by
playerraa0
playerraa0 wrote:

So I've got paired-end read files: s1_R1.fq and s1_R2.fq

  1. I align to some indexed reference with:

bwa aln "index_prefix" "s1_R1.fq" "s1_R2.fq" > s1.sai

  1. convert to sam:

bwa samse "index_prefix" "s1.sai" "s1_R1.fq" "s1_R2.fq" > s1.sam

  1. pull aligned reads only into bam format:

samtools view -Sb -F0x4 "s1.sam" > s1.al.bam

  1. convery bam to fastq

samtools bam2fq "s1.al.bam" > s1.al.fastq

Now my questions stem from this; when I convert the "s1.al.bam" to a fastq, I get a single fastq.

Are the paired reads effectively merged into a single end read from the alignment step (or from using samse in step2)?

Should I be using 'sampe' in step 2?

If so, how are the reads in the resulting fastq file from this arranged? Still as merged single end? Interleaved? Separate files from a different tool?

The description of sampe throws me a bit "Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly."

bwa sampe samse alignment paired • 1.6k views
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by playerraa0

You know that bwa (at least the older version I use) will accept a .bam as input. Some other programs,like velvet, will accept .bams as inputs too.

ADD REPLYlink written 2.8 years ago by swbarnes25.2k

Hey, yea bwa aln has an option to accept bam as input, but I do not see an option for this with bwa mem.

ADD REPLYlink written 2.8 years ago by playerraa0
1
gravatar for playerraa
2.8 years ago by
playerraa0
playerraa0 wrote:

So the reason I wanted to get a fastq after alignment was so I could align the resulting aligned/unaligned reads to a different reference.

My solution was to just use the bwa mem as u/igor suggested:

bwa mem "index_prefix" "s1_R1.fq" "s1_R2.fq" > s1.sam
samtools view -Sb -F0x4 "s1.sam" > s1.al.bam
samtools bam2fq "s1.al.bam" > s1.al.fastq

This puts the paired-reads in an interleaved format in both the sam and fastq files:

tail -4 s1.sam | awk '{print $1}'
MONK:312:C2GR3ACXX:6:1210:20334:90480
MONK:312:C2GR3ACXX:6:1210:20334:90480
MONK:312:C2GR3ACXX:6:1210:20277:90491
MONK:312:C2GR3ACXX:6:1210:20277:90491
tail -16 s1.al.fastq | awk 'NR==1 || NR % 4==1'
@MONK:312:C2GR3ACXX:6:1210:20334:90480/1
@MONK:312:C2GR3ACXX:6:1210:20334:90480/2
@MONK:312:C2GR3ACXX:6:1210:20277:90491/1
@MONK:312:C2GR3ACXX:6:1210:20277:90491/2

So now for the next bwa mem alignment using s1.al.fastq, I need to use the -p flag.

ADD COMMENTlink written 2.8 years ago by playerraa0

In future you could try an aligner like BBMap (bowtie2 may as well) that allows you to collected unmapped reads on the fly in two files (R1/R2) which will save you this additional work.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax65k
2
gravatar for igor
2.8 years ago by
igor7.6k
United States
igor7.6k wrote:

As the description says, use samse for single-end reads and sampe for paired-end reads. In your case, that would be paired-end reads.

If you already have the FASTQs, why are you converting BAM back to FASTQ? You probably should not be doing that. But if you must, see this discussion on how to do that for paired-end reads: https://github.com/samtools/samtools/issues/402

The more recent version of samtools contains fastq command (instead of bam2fq) and handles paired-end reads better.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by igor7.6k

I think @playerra is trying to separate reads that are aligned.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax65k

That, and I was converting back to fastq so I could align to another reference. I see now that you can just use the bam file.

ADD REPLYlink written 2.8 years ago by playerraa0

So to use sampe I need to align each read file separately? So I get an sai for R1, and another sai for R2?

I had seen that you could just put both R1 and R2 in the argument list of bwa aln, but this results in only a single sai, which you would then use with samse. It's confusing...

ADD REPLYlink written 2.8 years ago by playerraa0

Yes, you will have an SAI for each FASTQ (1 for R1 and 1 for R2).

You may want to consider using bwa mem. It's arguably better and definitely easier to use.

ADD REPLYlink written 2.8 years ago by igor7.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1357 users visited in the last hour