Question

Question: BWA mem skip orientation

0

Entering edit mode

5.7 years ago

BellaK • 0

Hello everyone,

I am very new to genomic analysis and alignment, and I am posting this with the hope that someone with more knowledge can help (much obliged). I am running a bwa analysis of my DNA sequences (illumina sequencing). RUX is the population name. I am not sure if I should be concerned as it seems to be skipping the orientation for at least two combos out of the four possible ones (FF, FR, RF, RR). Usually RR, and RF are skipped, and sometimes FF. I have been trying to look this up online. One person wrote: 'The best approach is to use -I. Failing to infer insert size, bwa is essentially running in the single-end mode, which is less accurate than the paired-end mode.’ I am not sure if this is correct?

Is this a concern or a normal part of the process? See below:

bwa mem -t 40 ../dmel-all-aligned-r6.29.fasta RUX2_READ1_Sequences.txt RUX2_READ2_Sequences.txt > RUX2_LANE2.sam

[M::bwa_idx_load_from_disk] read 0 ALT contigs

[M::process] read 4000000 sequences (400000000 bp)...

[M::process] read 4000000 sequences (400000000 bp)...

[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (21, 53755, 195, 3)

[M::mem_pestat] analyzing insert size distribution for orientation FF...

[M::mem_pestat] (25, 50, 75) percentile: (35, 103, 260)

[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 710)

[M::mem_pestat] mean and std.dev: (120.89, 103.42)

[M::mem_pestat] low and high boundaries for proper pairs: (1, 935)

(...)

[main] Version: 0.7.17-r1188

[main] CMD: bwa mem -t 40 ../dmel-all-aligned-r6.29.fasta RUX2_READ1_Sequences.txt RUX2_READ2_Sequences.txt

[main] Real time: 485.038 sec; CPU: 17913.096 sec

THANK YOU for your help!

alignment next-gen assembly genome bwa • 2.3k views

ADD COMMENT • link updated 5.7 years ago by ATpoint 88k • written 5.7 years ago by BellaK • 0

score 3 · Accepted Answer · 2019-10-14

3

Entering edit mode

5.7 years ago

h.mon 35k

Illumina sequencing (which I assume is what you have) most commonly generates reads that are oriented as forward (read 1)-reverse (read 2) to each other. When mapping to a "proper" reference genome, most reads should thus map as FR. The other orientations (FF, RR and RF) are, most commonly, noise, or they could represent some structural difference between the reference genome and the sequenced sample. In any case, you expect then to be rare, and their insert size is meaningless - you should not worry about it. In addition, the -I flag only works for FR orientations, anyway:

   -I FLOAT[,FLOAT[,INT[,INT]]]
                 specify the mean, standard deviation (10% of the mean if absent), max
                 (4 sigma from the mean if absent) and min of the insert size distribution.
                 FR orientation only. [inferred]

ADD COMMENT • link 5.7 years ago by h.mon 35k

0

Entering edit mode

Agreed. Fyi BellaK I shortened your post a bit for better readability. As h.mon says, unless you get super-odd mapping results with many reads being not properly-paired you should not worry, these log messages are normal and expected.