I have initially cut my .bam
files to the specified chromosomes using samtools
with the following code:
samtools sort temp.bam temp.sorted
samtools index temp.sorted.bam
samtools view -bh temp.bam xx > temp.chrxx.bam
I am planning to align these sequences to the corresponding chromosome using bwa
. I have already download the chromosome specific sequence of chrxx
from UCSC.
bwa mem chrxx.fa mybam.bam > bwa.outxx.sam
bwa aln -t 4 chrxx.fa mybam.bam > outxx.bwa.sai
bwa samse chrxx.fa outxx.bwa.sai mybam.bam > bwa.samse.outxx.sam
Since the output is a sam
file, I would like to change this into a bam
file, using samtools
to then sort and index it again before processing for Quality control.
I used the command samtools view -bT chrxx.fa bwa.samse.outxx.sam > outxx.bwa.bam
Yet there is an error that occurs with the sam
output from bwa
alignment. If I were to show the upper ten lines only the upper two lines are shown:
@SQ SN:chr17 LN:81195210
@PG ID:bwa PN:bwa VN:0.7.10-r789 CL:chr1xx.fa outxxbam.sai /Volumes/Pegasus/tmp/out17.bam
The error seen is
[samopen] SAM header is present: 1 sequences.
[sam_read1] reference 'ID:bwa PN:bwa VN:0.7.10-r789 CL:bwa samse chrxx.fa /outxx.bam
' is recognized as '*'.
[main_samview] truncated file.
Please provide any help so that I can fix this issue. I would like to know what I am doing wrong. Thank you. Any assistance is appreciated.
You don't align a BAM, you align FASTQ files.
Is there any reason to align each read on each chromosome rather to align on the whole genome ?
I am only working with 2 chromosomes of each patient and I have a huge patient pool so I would like to minimize the processing time. Do I need to convert all the .bam file into .fq files?
Firstly, are you sure that the BAM files contain only unaligned reads? While this can be the case, it's typically not.
Edit: I was wrong about BWA and BAM input, I've removed that line. Though I should mention that I only know that
bwa aln
can do that (I'm not familiar withbwa mem
having that capability).