BWA mem skip orientation
0
0
Entering edit mode
3.4 years ago

Hi,

So I am not entirely sure if this is an issue or not. So to give a little background, I used bwa, picard, samtools, and bcftools for WGS data. I have used this "pipeline" in the past and have had success. I have one sample that I am interested in looking at but when I ran through this process, the vcf file it looks like there is not genetic information. So I am trying to troubleshoot here. I was comparing the output files and the only one that looks different is the bwa mem output file. Here is what it looks like

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 1228244 sequences (160000264 bp)...
[M::process] read 1120674 sequences (160000086 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 1228244 reads in 931.101 CPU sec, 70.722 real sec
[M::process] read 1116464 sequences (160000014 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 1120674 reads in 890.092 CPU sec, 56.756 real sec
[M::process] read 1118946 sequences (160000176 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 1116464 reads in 855.762 CPU sec, 60.114 real sec
[M::process] read 1125142 sequences (160000104 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs


The files goes on. Is this alarming?

One thing I noticed was when I load the bam file into IGV I can see the reads and I can see SNP and INDELS. So I wonder if there is an issue with bamtools and samtools? What do you think?

assembly genome alignment next-gen • 3.5k views
1
Entering edit mode

I assume this is a paired-end dataset? Was it trimmed as individual files at some point? Is there a chance that the reads were not in sync in R1/R2 files when the alignment was done?

0
Entering edit mode

That is correct, they are paried-end and they were trimmed using trimmomatic. I didn't check the output of trimmomatic because it produced two files but maybe I didn't set a long enough run time. I will check now.

0
Entering edit mode

It looks good to me?

TrimmomaticPE: Started with arguments:
-threads 16 /gpfs_backup/meurs_data/HorsePipeline/O02396_S47_R1_001.fastq.gz /gpfs_backup/meurs_data/HorsePipeline/O02396_S47_R2_001.fastq.gz /gpfs_backup/meurs_data/HorsePipeline/H02396_S47_R1_001_trim_paired.fastq.gz /gpfs_backup/meurs_data/HorsePipeline/H02396_S47_R1_001_trim_unpaired.fastq.gz /gpfs_backup/meurs_data/HorsePipeline/H02396_S47_R2_001_trim_paired.fastq.gz /gpfs_backup/meurs_data/HorsePipeline/H02396_S47_R2_001_trim_unpaired.fastq.gz LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:70
Quality encoding detected as phred33
Input Read Pairs: 385790482 Both Surviving: 329048547 (85.29%) Forward Only Surviving: 47840969 (12.40%) Reverse Only Surviving: 3783893 (0.98%) Dropped: 5117073 (1.33%)
TrimmomaticPE: Completed successfully

1
Entering edit mode

While that log looks fine the files themselves may not be.

Use repair.sh from BBMap suite to see if you find singletons in your data.

repair.sh in1=r1.fq.gz in2=r2.fq.gz out1=fixed1.fq.gz out2=fixed2.fq.gz outsingle=singletons.fq.gz

0
Entering edit mode

I will give this a try

0
Entering edit mode

This is running now. How should I present that output to you? should I look in the singletons.fq.gz fine?

1
Entering edit mode

Nothing should be there in singletons.fq.gz file if the reads are in proper order in your R1/R2 files. If you are starting to see things showing up in singletons.fq.gz then they are not. Let us know if singletons.fq.gz ends up being a zero byte file.

0
Entering edit mode

It looks like singletons.fq.gz is empty. Oddly enough, the paired files are a bit larger than the original paired files from trimmomatic

1
Entering edit mode

That is good news. At least this shows that your data is in sync. I would not use file size as a metric as any QC so don't worry about that.

I don't use bwa mem so I am not sure if you need to tell if the insert sizes in your library are longer/shorter than expected. Perhaps that may be reason it is not able to figure out the correct read orientation for this library.

0
Entering edit mode

I am just curious what the actual issue is. The trimming seems fine, bwa seems to be fine, the bam file seems to be fine? I could see snps and what not. Maybe there is an issue with bcftools? I uses samstools flagstat and that worked. Any suggestions on where to look next?

0
Entering edit mode

I figured it out. I shamefully aligned R2 twice (see below). This is most dishonorable

1
Entering edit mode

I would guess your fastq files are somehow broken and lost pairing between R1 and R2. When BWA mapped, it considered all pairs as unproperly paired and assigned low mapping quality, so then when calling snps they are all filtered out.

I used bwa, picard, samtools, and bcftools for WGS data

It would be helpful if you show the commands. Also, show the start and end of the fastq files, with head file.fastq and tail file.fastq, or if they are compressed, zcat file.fastq.gz | head and zcat file.fastq.gz | tail.

0
Entering edit mode

I'm working on this zcat file.fastq.gz | tail is taking a bit

1
Entering edit mode

These are normal log messages from bwa mem not errors. How is the overall mapping rate in the end according to samtools flagstat?

0
Entering edit mode

I will check. Do you know samtools flagstat take a long time to run? I am running this on my university cluster so I may need to submit a job if it takes a lot of time/memory

1
Entering edit mode

Depends on the file size, few minutes probably. Does not take any memory at all. Next time you run an alignment, consider this kind of command for a in-pipe solution.

bwa mem (options) in_1.fastq in2.fastq | samtools view -b | tee out.bam | samtools flagstat - > out.flagstat

0
Entering edit mode

This is what I got

666218994 + 0 in total (QC-passed reads + QC-failed reads)
8121900 + 0 secondary
0 + 0 supplementary
82816210 + 0 duplicates
664624340 + 0 mapped (99.76% : N/A)
658097094 + 0 paired in sequencing
0 + 0 properly paired (0.00% : N/A)
656502440 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
42096420 + 0 with mate mapped to a different chr
27416 + 0 with mate mapped to a different chr (mapQ>=5)

1
Entering edit mode

Basically everything mapped but no proper pairs. Are these really standard paired-end data? Your data or downloaded?

0
Entering edit mode

It is my data but it was stored on our labs AWS account. There was a new reference sequence for this species so I just wanted to align the reads to the new reference. Maybe there was an issue with the AWS download?

1
Entering edit mode

You have verified that the data is in proper order (as far as reads go) so this should have nothing to do with AWS storage/download itself.

What is odd is why bwa mem is unable to determine the read orientation and the fact that the reads don't map as proper pairs.

0
Entering edit mode

Sigh ....I am embarrassed to say that when running bwa mem I put in R2 in twice so this all makes sense

bwa mem -M -t 16 -R '@RG\tID:H02396_S47_0001\tSM:H02396\tPL:illumina\tLB:H02396_lib1\tPU:barcode_H02396' Equus_caballus.EquCab3.0.dna_sm.toplevel.fa H02396_S47_R2_001_trim_paired.fastq.gz H02396_S47_R2_001_trim_paired.fastq.gz > H02396_S47_001_aligned2.sam


As you can see I put R2 twice slaps hand over face

1
Entering edit mode

Zing!

0
Entering edit mode

Thank you two so much for sticking through this with me! I really appreciate the BioStar community.

1
Entering edit mode

No problem, these kinds of errors happened to all of us at some point. Learn from it and carry on ;-)