Hi,
So I am not entirely sure if this is an issue or not. So to give a little background, I used bwa, picard, samtools, and bcftools for WGS data. I have used this "pipeline" in the past and have had success. I have one sample that I am interested in looking at but when I ran through this process, the vcf file it looks like there is not genetic information. So I am trying to troubleshoot here. I was comparing the output files and the only one that looks different is the bwa mem output file. Here is what it looks like
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 1228244 sequences (160000264 bp)...
[M::process] read 1120674 sequences (160000086 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 1228244 reads in 931.101 CPU sec, 70.722 real sec
[M::process] read 1116464 sequences (160000014 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 1120674 reads in 890.092 CPU sec, 56.756 real sec
[M::process] read 1118946 sequences (160000176 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 1116464 reads in 855.762 CPU sec, 60.114 real sec
[M::process] read 1125142 sequences (160000104 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
The files goes on. Is this alarming?
One thing I noticed was when I load the bam file into IGV I can see the reads and I can see SNP and INDELS. So I wonder if there is an issue with bamtools and samtools? What do you think?
Thanks in advance
I assume this is a paired-end dataset? Was it trimmed as individual files at some point? Is there a chance that the reads were not in sync in R1/R2 files when the alignment was done?
That is correct, they are paried-end and they were trimmed using trimmomatic. I didn't check the output of trimmomatic because it produced two files but maybe I didn't set a long enough run time. I will check now.
It looks good to me?
While that log looks fine the files themselves may not be.
Use
repair.sh
from BBMap suite to see if you find singletons in your data.I will give this a try
This is running now. How should I present that output to you? should I look in the singletons.fq.gz fine?
Nothing should be there in
singletons.fq.gz
file if the reads are in proper order in yourR1/R2
files. If you are starting to see things showing up insingletons.fq.gz
then they are not. Let us know ifsingletons.fq.gz
ends up being a zero byte file.It looks like singletons.fq.gz is empty. Oddly enough, the paired files are a bit larger than the original paired files from trimmomatic
That is good news. At least this shows that your data is in sync. I would not use file size as a metric as any QC so don't worry about that.
I don't use
bwa mem
so I am not sure if you need to tell if the insert sizes in your library are longer/shorter than expected. Perhaps that may be reason it is not able to figure out the correct read orientation for this library.I am just curious what the actual issue is. The trimming seems fine, bwa seems to be fine, the bam file seems to be fine? I could see snps and what not. Maybe there is an issue with bcftools? I uses samstools flagstat and that worked. Any suggestions on where to look next?
I figured it out. I shamefully aligned R2 twice (see below). This is most dishonorable
I would guess your fastq files are somehow broken and lost pairing between R1 and R2. When BWA mapped, it considered all pairs as unproperly paired and assigned low mapping quality, so then when calling snps they are all filtered out.
It would be helpful if you show the commands. Also, show the start and end of the fastq files, with
head file.fastq
andtail file.fastq
, or if they are compressed,zcat file.fastq.gz | head
andzcat file.fastq.gz | tail
.I'm working on this
zcat file.fastq.gz | tail
is taking a bitThese are normal log messages from
bwa mem
not errors. How is the overall mapping rate in the end according tosamtools flagstat
?I will check. Do you know
samtools flagstat
take a long time to run? I am running this on my university cluster so I may need to submit a job if it takes a lot of time/memoryDepends on the file size, few minutes probably. Does not take any memory at all. Next time you run an alignment, consider this kind of command for a in-pipe solution.
This is what I got
Basically everything mapped but no proper pairs. Are these really standard paired-end data? Your data or downloaded?
It is my data but it was stored on our labs AWS account. There was a new reference sequence for this species so I just wanted to align the reads to the new reference. Maybe there was an issue with the AWS download?
You have verified that the data is in proper order (as far as reads go) so this should have nothing to do with AWS storage/download itself.
What is odd is why
bwa mem
is unable to determine the read orientation and the fact that the reads don't map as proper pairs.Sigh ....I am embarrassed to say that when running
bwa mem
I put in R2 in twice so this all makes senseAs you can see I put R2 twice slaps hand over face
Zing!
Thank you two so much for sticking through this with me! I really appreciate the BioStar community.
No problem, these kinds of errors happened to all of us at some point. Learn from it and carry on ;-)