PE alignment with bwa aln and conversions
2
0
Entering edit mode
5.9 years ago
raplayer ▴ 10

So I've got paired-end read files: s1_R1.fq and s1_R2.fq

1. I align to some indexed reference with:

bwa aln "index_prefix" "s1_R1.fq" "s1_R2.fq" > s1.sai

1. convert to sam:

bwa samse "index_prefix" "s1.sai" "s1_R1.fq" "s1_R2.fq" > s1.sam

1. pull aligned reads only into bam format:

samtools view -Sb -F0x4 "s1.sam" > s1.al.bam

1. convery bam to fastq

samtools bam2fq "s1.al.bam" > s1.al.fastq

Now my questions stem from this; when I convert the "s1.al.bam" to a fastq, I get a single fastq.

Are the paired reads effectively merged into a single end read from the alignment step (or from using samse in step2)?

Should I be using 'sampe' in step 2?

If so, how are the reads in the resulting fastq file from this arranged? Still as merged single end? Interleaved? Separate files from a different tool?

The description of sampe throws me a bit "Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly."

alignment paired bwa samse sampe • 3.9k views
0
Entering edit mode

You know that bwa (at least the older version I use) will accept a .bam as input. Some other programs,like velvet, will accept .bams as inputs too.

0
Entering edit mode

Hey, yea bwa aln has an option to accept bam as input, but I do not see an option for this with bwa mem.

1
Entering edit mode
5.9 years ago
raplayer ▴ 10

So the reason I wanted to get a fastq after alignment was so I could align the resulting aligned/unaligned reads to a different reference.

My solution was to just use the bwa mem as u/igor suggested:

bwa mem "index_prefix" "s1_R1.fq" "s1_R2.fq" > s1.sam
samtools view -Sb -F0x4 "s1.sam" > s1.al.bam
samtools bam2fq "s1.al.bam" > s1.al.fastq


This puts the paired-reads in an interleaved format in both the sam and fastq files:

tail -4 s1.sam | awk '{print \$1}'
MONK:312:C2GR3ACXX:6:1210:20334:90480
MONK:312:C2GR3ACXX:6:1210:20334:90480
MONK:312:C2GR3ACXX:6:1210:20277:90491
MONK:312:C2GR3ACXX:6:1210:20277:90491
tail -16 s1.al.fastq | awk 'NR==1 || NR % 4==1'
@MONK:312:C2GR3ACXX:6:1210:20334:90480/1
@MONK:312:C2GR3ACXX:6:1210:20334:90480/2
@MONK:312:C2GR3ACXX:6:1210:20277:90491/1
@MONK:312:C2GR3ACXX:6:1210:20277:90491/2


So now for the next bwa mem alignment using s1.al.fastq, I need to use the -p flag.

0
Entering edit mode

In future you could try an aligner like BBMap (bowtie2 may as well) that allows you to collected unmapped reads on the fly in two files (R1/R2) which will save you this additional work.

2
Entering edit mode
5.9 years ago
igor 12k

As the description says, use samse for single-end reads and sampe for paired-end reads. In your case, that would be paired-end reads.

If you already have the FASTQs, why are you converting BAM back to FASTQ? You probably should not be doing that. But if you must, see this discussion on how to do that for paired-end reads: https://github.com/samtools/samtools/issues/402

The more recent version of samtools contains fastq command (instead of bam2fq) and handles paired-end reads better.

0
Entering edit mode

I think @playerra is trying to separate reads that are aligned.

0
Entering edit mode

That, and I was converting back to fastq so I could align to another reference. I see now that you can just use the bam file.

0
Entering edit mode

So to use sampe I need to align each read file separately? So I get an sai for R1, and another sai for R2?

I had seen that you could just put both R1 and R2 in the argument list of bwa aln, but this results in only a single sai, which you would then use with samse. It's confusing...

0
Entering edit mode

Yes, you will have an SAI for each FASTQ (1 for R1 and 1 for R2).

You may want to consider using bwa mem. It's arguably better and definitely easier to use.