Hello everyone,
I am very new to genomic analysis and alignment, and I am posting this with the hope that someone with more knowledge can help (much obliged). I am running a bwa analysis of my DNA sequences (illumina sequencing). RUX is the population name. I am not sure if I should be concerned as it seems to be skipping the orientation for at least two combos out of the four possible ones (FF, FR, RF, RR). Usually RR, and RF are skipped, and sometimes FF. I have been trying to look this up online. One person wrote: 'The best approach is to use -I. Failing to infer insert size, bwa is essentially running in the single-end mode, which is less accurate than the paired-end mode.’ I am not sure if this is correct?
Is this a concern or a normal part of the process? See below:
bwa mem -t 40 ../dmel-all-aligned-r6.29.fasta RUX2_READ1_Sequences.txt RUX2_READ2_Sequences.txt > RUX2_LANE2.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 4000000 sequences (400000000 bp)...
[M::process] read 4000000 sequences (400000000 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (21, 53755, 195, 3)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (35, 103, 260)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 710)
[M::mem_pestat] mean and std.dev: (120.89, 103.42)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 935)
(...)
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 40 ../dmel-all-aligned-r6.29.fasta RUX2_READ1_Sequences.txt RUX2_READ2_Sequences.txt
[main] Real time: 485.038 sec; CPU: 17913.096 sec
THANK YOU for your help!
Agreed. Fyi BellaK I shortened your post a bit for better readability. As h.mon says, unless you get super-odd mapping results with many reads being not properly-paired you should not worry, these log messages are normal and expected.
Thank you so much!! This was such a big relief!