Off topic:sequential alignment to filter out reads, pair synchronization problem
0
0
Entering edit mode
5.8 years ago

Dear all,

I am trying to filter out reads by re-aligning a BAM file. The filtered reads then need to be analyzed with things like GATK or Picard, hence I think I need a read group. The same sample was sequenced in two different runs, so it is split between file1 and file2, paired. Since I understand that these files cannot be combined before alignment, I am running them in parallel. I am selecting the reads that mapped to a first reference genome, then I want to select the reads that map to a second reference genome.

This is the procedure I followed:

# align
bwa mem -R <read_group1> <ref.fa> <file1_1.fq> <file1_2.fq> -o <file1_aln.sam>
bwa mem -R <read_group2> <ref.fa> <file2_1.fq> <file2_2.fq> -o <file2_aln.sam>
# convert
samtools view -Sb  <file1_aln.sam>  >  <file1_aln.bam>
samtools view -Sb  <file2_aln.sam>  >  <file2_aln.bam>
# select mapped reads
rm 501N-D3_alnHUM.sam
samtools view -h -F 4 <file1_aln.bam> -o <file1_map.bam>
samtools view -h -F 4 <file2_aln.bam> -o <file2_map.bam>
# sort
samtools sort <file1_map.bam>  -o <file1_mapSRT.bam>
samtools sort <file2_map.bam>  -o <file2_mapSRT.bam>
# reconvert to fastq
samtools fastq -1 <file1_1map.fq>  -2 <file1_2map.fq> <file2_mapSRT.bam>
samtools fastq -1 <file2_1map.fq>  -2 <file2_2map.fq> <file2_mapSRT.bam>
# re-align
bwa mem -R <read_group1> <ref.fa> <file1_1map.fq> <file1_2map.fq> -o <file1map_aln.sam>

However, when running the first re-alignment I get an error due to the read group:

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 105748 sequences (10000048 bp)...
[M::process] read 104770 sequences (10000147 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (413, 343, 0, 304)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (4, 13, 25)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 67)
[M::mem_pestat] mean and std.dev: (15.77, 13.79)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 88)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (42, 53, 70)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 126)
[M::mem_pestat] mean and std.dev: (55.71, 16.50)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 154)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (7, 16, 28)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 70)
[M::mem_pestat] mean and std.dev: (18.78, 14.24)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 91)
[mem_sam_pe] paired reads have different names: "HWI-ST1437:64:C3UM1ACXX:4:1214:3833:62731", "HWI-ST1437:64:C3UM1ACXX:4:1202:18519:45906"

It looks like the pairs are no longer synchronized (different x-y coordinates). I had to sort the reads in order to re-create the fastq files (that's the way I know, mpileup I understand is more for creating a consenus fasta).

How can I keep synchronization to realign a BAM file?

thank you

alignment read group BWA • 1.1k views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6