Dear BioStars Leaders,
I have been running BWA MEM 0.7.12 on a handful of paired-end sample fastq files and it worked out pretty well. I have also included BWA as part of a bash and python based pipeline.
When I run BWA on a specific lane of a specific sample fastq files (paired-end sequencing files), I received this error and wondering what the issue might be.
[gzread] <fd:4>: invalid distance code
Here is the command that I ran on a Linux server :
$bwa mem \
-t 18 \
-M \
-R "@RG\tID:development_run_070_WES-VAL3_L002\tSM:Sample_13016\tPL:IlluminaNextSeq500\tLB:Lib1\tPU:Unit1" \
/home/hg19/ucsc.hg19.fasta S4_L002_R1_001.fastq.gz S4_L002_R2_001.fastq.gz > bwaAlignReads.sam 2> bwa.stderr.log
I am curious to hear from others if you have got similar error, and it would be great if anyone could suggest any possible solutions
Here are the last 30 rows from the bwa-mem error log file that I saved :
[M::process] read 1314136 sequences (180000260 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (2, 531348, 12, 3)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (134, 177, 227)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 413)
[M::mem_pestat] mean and std.dev: (182.75, 71.19)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 506)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (125, 227, 409)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 977)
[M::mem_pestat] mean and std.dev: (210.27, 167.43)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1261)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation RF
[M::mem_process_seqs] Processed 1309276 reads in 506.988 CPU sec, 28.235 real sec
[W::bseq_read] the 2nd file has fewer sequences.
[M::process] read 774742 sequences (106186594 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (2, 533411, 8, 4)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (134, 177, 226)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 410)
[M::mem_pestat] mean and std.dev: (182.10, 71.36)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 502)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 1314136 reads in 497.150 CPU sec, 27.629 real sec
[gzread] <fd:4>: invalid distance code
Thanks
Pierre,
Thank you for a quick reply, and for suggesting gzip --test .
S4_L002_R2_001.fastq.gz failed the gzip test. I also looked into my fastQC results and realized that this file failed at fastQC check as well. I started my analysis with fastq files and Lane 2 R2 file seems to be the issue. It looks like I need to re-generate fastq files from bcl files for this sample using bcl2fastq2, Please let me know if there is an alternative solution. Thank you again.