Question: Question: BWA mem skip orientation
0
gravatar for BellaK
5 weeks ago by
BellaK0
UCI
BellaK0 wrote:

Hello everyone,

I am very new to genomic analysis and alignment, and I am posting this with the hope that someone with more knowledge can help (much obliged). I am running a bwa analysis of my DNA sequences (illumina sequencing). RUX is the population name. I am not sure if I should be concerned as it seems to be skipping the orientation for at least two combos out of the four possible ones (FF, FR, RF, RR). Usually RR, and RF are skipped, and sometimes FF. I have been trying to look this up online. One person wrote: 'The best approach is to use -I. Failing to infer insert size, bwa is essentially running in the single-end mode, which is less accurate than the paired-end mode.’ I am not sure if this is correct?

Is this a concern or a normal part of the process? See below:

bwa mem -t 40 ../dmel-all-aligned-r6.29.fasta RUX2_READ1_Sequences.txt RUX2_READ2_Sequences.txt > RUX2_LANE2.sam

[M::bwa_idx_load_from_disk] read 0 ALT contigs

[M::process] read 4000000 sequences (400000000 bp)...

[M::process] read 4000000 sequences (400000000 bp)...

[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (21, 53755, 195, 3)

[M::mem_pestat] analyzing insert size distribution for orientation FF...

[M::mem_pestat] (25, 50, 75) percentile: (35, 103, 260)

[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 710)

[M::mem_pestat] mean and std.dev: (120.89, 103.42)

[M::mem_pestat] low and high boundaries for proper pairs: (1, 935)

(...)

[main] Version: 0.7.17-r1188

[main] CMD: bwa mem -t 40 ../dmel-all-aligned-r6.29.fasta RUX2_READ1_Sequences.txt RUX2_READ2_Sequences.txt

[main] Real time: 485.038 sec; CPU: 17913.096 sec

THANK YOU for your help!

ADD COMMENTlink modified 5 weeks ago by ATpoint25k • written 5 weeks ago by BellaK0
2
gravatar for h.mon
5 weeks ago by
h.mon28k
Brazil
h.mon28k wrote:

Illumina sequencing (which I assume is what you have) most commonly generates reads that are oriented as forward (read 1)-reverse (read 2) to each other. When mapping to a "proper" reference genome, most reads should thus map as FR. The other orientations (FF, RR and RF) are, most commonly, noise, or they could represent some structural difference between the reference genome and the sequenced sample. In any case, you expect then to be rare, and their insert size is meaningless - you should not worry about it. In addition, the -I flag only works for FR orientations, anyway:

   -I FLOAT[,FLOAT[,INT[,INT]]]
                 specify the mean, standard deviation (10% of the mean if absent), max
                 (4 sigma from the mean if absent) and min of the insert size distribution.
                 FR orientation only. [inferred]
ADD COMMENTlink written 5 weeks ago by h.mon28k

Agreed. Fyi BellaK I shortened your post a bit for better readability. As h.mon says, unless you get super-odd mapping results with many reads being not properly-paired you should not worry, these log messages are normal and expected.

ADD REPLYlink written 5 weeks ago by ATpoint25k

Thank you so much!! This was such a big relief!

ADD REPLYlink written 4 weeks ago by BellaK0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1708 users visited in the last hour