Question: BWA mem skip orientation
0
gravatar for williamsbrian5064
6 months ago by
williamsbrian5064210 wrote:

Hi,

So I am not entirely sure if this is an issue or not. So to give a little background, I used bwa, picard, samtools, and bcftools for WGS data. I have used this "pipeline" in the past and have had success. I have one sample that I am interested in looking at but when I ran through this process, the vcf file it looks like there is not genetic information. So I am trying to troubleshoot here. I was comparing the output files and the only one that looks different is the bwa mem output file. Here is what it looks like

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 1228244 sequences (160000264 bp)...
[M::process] read 1120674 sequences (160000086 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 1228244 reads in 931.101 CPU sec, 70.722 real sec
[M::process] read 1116464 sequences (160000014 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 1120674 reads in 890.092 CPU sec, 56.756 real sec
[M::process] read 1118946 sequences (160000176 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 1116464 reads in 855.762 CPU sec, 60.114 real sec
[M::process] read 1125142 sequences (160000104 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs

The files goes on. Is this alarming?

One thing I noticed was when I load the bam file into IGV I can see the reads and I can see SNP and INDELS. So I wonder if there is an issue with bamtools and samtools? What do you think?

Thanks in advance

ADD COMMENTlink written 6 months ago by williamsbrian5064210
1

I assume this is a paired-end dataset? Was it trimmed as individual files at some point? Is there a chance that the reads were not in sync in R1/R2 files when the alignment was done?

ADD REPLYlink written 6 months ago by genomax74k

That is correct, they are paried-end and they were trimmed using trimmomatic. I didn't check the output of trimmomatic because it produced two files but maybe I didn't set a long enough run time. I will check now.

ADD REPLYlink written 6 months ago by williamsbrian5064210

It looks good to me?

TrimmomaticPE: Started with arguments:
 -threads 16 /gpfs_backup/meurs_data/HorsePipeline/O02396_S47_R1_001.fastq.gz /gpfs_backup/meurs_data/HorsePipeline/O02396_S47_R2_001.fastq.gz /gpfs_backup/meurs_data/HorsePipeline/H02396_S47_R1_001_trim_paired.fastq.gz /gpfs_backup/meurs_data/HorsePipeline/H02396_S47_R1_001_trim_unpaired.fastq.gz /gpfs_backup/meurs_data/HorsePipeline/H02396_S47_R2_001_trim_paired.fastq.gz /gpfs_backup/meurs_data/HorsePipeline/H02396_S47_R2_001_trim_unpaired.fastq.gz LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:70
Quality encoding detected as phred33
Input Read Pairs: 385790482 Both Surviving: 329048547 (85.29%) Forward Only Surviving: 47840969 (12.40%) Reverse Only Surviving: 3783893 (0.98%) Dropped: 5117073 (1.33%)
TrimmomaticPE: Completed successfully
ADD REPLYlink written 6 months ago by williamsbrian5064210
1

While that log looks fine the files themselves may not be.

Use repair.sh from BBMap suite to see if you find singletons in your data.

repair.sh in1=r1.fq.gz in2=r2.fq.gz out1=fixed1.fq.gz out2=fixed2.fq.gz outsingle=singletons.fq.gz
ADD REPLYlink modified 6 months ago • written 6 months ago by genomax74k

I will give this a try

ADD REPLYlink written 6 months ago by williamsbrian5064210

This is running now. How should I present that output to you? should I look in the singletons.fq.gz fine?

ADD REPLYlink written 6 months ago by williamsbrian5064210
1

Nothing should be there in singletons.fq.gz file if the reads are in proper order in your R1/R2 files. If you are starting to see things showing up in singletons.fq.gz then they are not. Let us know if singletons.fq.gz ends up being a zero byte file.

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax74k

It looks like singletons.fq.gz is empty. Oddly enough, the paired files are a bit larger than the original paired files from trimmomatic

ADD REPLYlink written 6 months ago by williamsbrian5064210
1

That is good news. At least this shows that your data is in sync. I would not use file size as a metric as any QC so don't worry about that.

I don't use bwa mem so I am not sure if you need to tell if the insert sizes in your library are longer/shorter than expected. Perhaps that may be reason it is not able to figure out the correct read orientation for this library.

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax74k

I am just curious what the actual issue is. The trimming seems fine, bwa seems to be fine, the bam file seems to be fine? I could see snps and what not. Maybe there is an issue with bcftools? I uses samstools flagstat and that worked. Any suggestions on where to look next?

ADD REPLYlink written 6 months ago by williamsbrian5064210

I figured it out. I shamefully aligned R2 twice (see below). This is most dishonorable

ADD REPLYlink modified 6 months ago • written 6 months ago by williamsbrian5064210
1

I would guess your fastq files are somehow broken and lost pairing between R1 and R2. When BWA mapped, it considered all pairs as unproperly paired and assigned low mapping quality, so then when calling snps they are all filtered out.

I used bwa, picard, samtools, and bcftools for WGS data

It would be helpful if you show the commands. Also, show the start and end of the fastq files, with head file.fastq and tail file.fastq, or if they are compressed, zcat file.fastq.gz | head and zcat file.fastq.gz | tail.

ADD REPLYlink written 6 months ago by h.mon28k

I'm working on this zcat file.fastq.gz | tail is taking a bit

ADD REPLYlink written 6 months ago by williamsbrian5064210
1

These are normal log messages from bwa mem not errors. How is the overall mapping rate in the end according to samtools flagstat?

ADD REPLYlink written 6 months ago by ATpoint25k

I will check. Do you know samtools flagstat take a long time to run? I am running this on my university cluster so I may need to submit a job if it takes a lot of time/memory

ADD REPLYlink modified 6 months ago • written 6 months ago by williamsbrian5064210
1

Depends on the file size, few minutes probably. Does not take any memory at all. Next time you run an alignment, consider this kind of command for a in-pipe solution.

bwa mem (options) in_1.fastq in2.fastq | samtools view -b | tee out.bam | samtools flagstat - > out.flagstat
ADD REPLYlink written 6 months ago by ATpoint25k

This is what I got

666218994 + 0 in total (QC-passed reads + QC-failed reads)
8121900 + 0 secondary
0 + 0 supplementary
82816210 + 0 duplicates
664624340 + 0 mapped (99.76% : N/A)
658097094 + 0 paired in sequencing
329048547 + 0 read1
329048547 + 0 read2
0 + 0 properly paired (0.00% : N/A)
656502440 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
42096420 + 0 with mate mapped to a different chr
27416 + 0 with mate mapped to a different chr (mapQ>=5)
ADD REPLYlink written 6 months ago by williamsbrian5064210
1

Basically everything mapped but no proper pairs. Are these really standard paired-end data? Your data or downloaded?

ADD REPLYlink written 6 months ago by ATpoint25k

It is my data but it was stored on our labs AWS account. There was a new reference sequence for this species so I just wanted to align the reads to the new reference. Maybe there was an issue with the AWS download?

ADD REPLYlink written 6 months ago by williamsbrian5064210
1

You have verified that the data is in proper order (as far as reads go) so this should have nothing to do with AWS storage/download itself.

What is odd is why bwa mem is unable to determine the read orientation and the fact that the reads don't map as proper pairs.

ADD REPLYlink written 6 months ago by genomax74k

Sigh ....I am embarrassed to say that when running bwa mem I put in R2 in twice so this all makes sense

bwa mem -M -t 16 -R '@RG\tID:H02396_S47_0001\tSM:H02396\tPL:illumina\tLB:H02396_lib1\tPU:barcode_H02396' Equus_caballus.EquCab3.0.dna_sm.toplevel.fa H02396_S47_R2_001_trim_paired.fastq.gz H02396_S47_R2_001_trim_paired.fastq.gz > H02396_S47_001_aligned2.sam

As you can see I put R2 twice slaps hand over face

ADD REPLYlink written 6 months ago by williamsbrian5064210
1

Zing!

ADD REPLYlink written 6 months ago by genomax74k

Thank you two so much for sticking through this with me! I really appreciate the BioStar community.

ADD REPLYlink written 6 months ago by williamsbrian5064210
1

No problem, these kinds of errors happened to all of us at some point. Learn from it and carry on ;-)

ADD REPLYlink written 6 months ago by ATpoint25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1119 users visited in the last hour