Question: (Closed) sequential alignment to filter out reads, pair synchronization problem
0
gravatar for marongiu.luigi
16 months ago by
Germany, Mannheim, UMM
marongiu.luigi420 wrote:

Dear all,

I am trying to filter out reads by re-aligning a BAM file. The filtered reads then need to be analyzed with things like GATK or Picard, hence I think I need a read group. The same sample was sequenced in two different runs, so it is split between file1 and file2, paired. Since I understand that these files cannot be combined before alignment, I am running them in parallel. I am selecting the reads that mapped to a first reference genome, then I want to select the reads that map to a second reference genome.

This is the procedure I followed:

# align
bwa mem -R <read_group1> <ref.fa> <file1_1.fq> <file1_2.fq> -o <file1_aln.sam>
bwa mem -R <read_group2> <ref.fa> <file2_1.fq> <file2_2.fq> -o <file2_aln.sam>
# convert
samtools view -Sb  <file1_aln.sam>  >  <file1_aln.bam>
samtools view -Sb  <file2_aln.sam>  >  <file2_aln.bam>
# select mapped reads
rm 501N-D3_alnHUM.sam
samtools view -h -F 4 <file1_aln.bam> -o <file1_map.bam>
samtools view -h -F 4 <file2_aln.bam> -o <file2_map.bam>
# sort
samtools sort <file1_map.bam>  -o <file1_mapSRT.bam>
samtools sort <file2_map.bam>  -o <file2_mapSRT.bam>
# reconvert to fastq
samtools fastq -1 <file1_1map.fq>  -2 <file1_2map.fq> <file2_mapSRT.bam>
samtools fastq -1 <file2_1map.fq>  -2 <file2_2map.fq> <file2_mapSRT.bam>
# re-align
bwa mem -R <read_group1> <ref.fa> <file1_1map.fq> <file1_2map.fq> -o <file1map_aln.sam>

However, when running the first re-alignment I get an error due to the read group:

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 105748 sequences (10000048 bp)...
[M::process] read 104770 sequences (10000147 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (413, 343, 0, 304)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (4, 13, 25)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 67)
[M::mem_pestat] mean and std.dev: (15.77, 13.79)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 88)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (42, 53, 70)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 126)
[M::mem_pestat] mean and std.dev: (55.71, 16.50)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 154)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (7, 16, 28)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 70)
[M::mem_pestat] mean and std.dev: (18.78, 14.24)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 91)
[mem_sam_pe] paired reads have different names: "HWI-ST1437:64:C3UM1ACXX:4:1214:3833:62731", "HWI-ST1437:64:C3UM1ACXX:4:1202:18519:45906"

It looks like the pairs are no longer synchronized (different x-y coordinates). I had to sort the reads in order to re-create the fastq files (that's the way I know, mpileup I understand is more for creating a consenus fasta).

How can I keep synchronization to realign a BAM file?

thank you

bwa read group alignment • 511 views
ADD COMMENTlink written 16 months ago by marongiu.luigi420

or maybe the problem is simply that BWA cannot re-sequence BAM files? maybe I should use other aligners?

ADD REPLYlink written 16 months ago by marongiu.luigi420

Hello marongiu.luigi,

this seems to be a duplicate of your previous post:

Realign BAM files to other reference file

Please stay at this thread and add additional information there. This avoid duplicate answers and work for people who likes to help you.

fin swimmer

ADD REPLYlink modified 16 months ago • written 16 months ago by finswimmer12k

this is linked to that post, but comes after having followed the suggestions and it is more focused on the synchronization problem; to keep the previous thread i would have needed to change its title. That post was asking instead what is the procedure for 'metagenomic comparison'.

ADD REPLYlink written 16 months ago by marongiu.luigi420
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour