Which indication of a failure can be found in bwa-mem or samtools' output?
1
0
Entering edit mode
4.3 years ago
serpalma.v ▴ 70

Hello

I aligned 108 FASTQ pairs with bwa-mem in one go, after alignment I transformed the SAMs into BAMs with samtools. I wrote the script so that I know when a pair starts and ends.

The job was completed, because I found 108 BAMs in the output directory and the script indicates that all pairs were processed.

I want to rule out the possibility that an error occured by searching for a message that could indicate that something went wrong. I am not asking for the exact message, but some string that is common and exclusive to all error messages, either for bwa-mem or samtools.

bwa samtools • 1.7k views
0
Entering edit mode

Did you redirect the output from your script to a log file?

0
Entering edit mode

I did this nohup script.sh &> script.out &

0
Entering edit mode

Could you post a few lines from script.out?

0
Entering edit mode

Beginning and end for the first pair:

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 2210544 sequences (320000246 bp)...
[M::process] read 2207480 sequences (320000211 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (82, 871436, 27, 24)
[M::mem_pestat] analyzing insert size distribution for orientation FF...

[...many lines...]

[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 982502 reads in 754.311 CPU sec, 24.037 real sec
[main] Version: 0.7.16a-r1181
[main] CMD: bwa mem -t 32 -M path/to/ref/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz path/to/fastq/R1.fastq.gz path/to/fastq/R2.fastq.gz
[main] Real time: 848.251 sec; CPU: 26138.097 sec

0
Entering edit mode

By the way, for the future: You can prevent BWA from being so talky by simply modifying the BWA source code (bwamem_pair.c), putting if (bwa_verbose >= 3) in front of all lines where fprintf triggers these messages.

E.g. line 84:

fprintf(stderr, "[M::%s] (25, 50, 75) percentile: (%d, %d, %d)\n", __func__, p25, p50, p75);


becomes:

if (bwa_verbose >= 3) fprintf(stderr, "[M::%s] (25, 50, 75) percentile: (%d, %d, %d)\n", __func__, p25, p50, p75);


Using bwa mem -v3 will then limit the STDERR messages to a necessary minimum. If you are interested, on my Git there is a version of bwamem_pair.c with these modifications. You'll need to recompile BWA of course.

0
Entering edit mode

I want to rule out the possibility that an error occured by searching for a message that could indicate that something went wrong

test bwa returned 0. https://bash.cyberciti.biz/guide/The_exit_status_of_a_command

0
Entering edit mode

Thanks, If I understood correctly, I will have to re-run the script including the exit status after bwa and samtools, did I get it right?

0
Entering edit mode

I would grep for obvious words in the log file (you already have) that can indicate problems (e.g. killed, error you get the idea). If the job took an expected length of time (108 samples should have taken some time) and checks noted by @ATPoint below go through then all should be well.

0
Entering edit mode
4.3 years ago
ATpoint 66k

I would do the following:

Check file sizes

ls -lh *.bam


Even though file sizes are not too informative due to compression level differences, if a BAM is only a few KB in size, something is obviously wrong.

Check if all BAM files are valid

samtools quickcheck -qvvv *.bam > bad_BAM.txt && echo 'all ok' || echo 'some files failed check, see bad_BAM.txt'


If some BAM files have obvious anomalities, they file name will be printed to bad_BAM.txt.

Check the log file

grep 'M::mem_process_seqs' script.out


This message is always printed to STDERR at the end of the alignment, stating how many reads have been processed. You can compare that to the number of reads in your fastq files. If the numbers are identical, it should be fine. Alternatively, you can use samtools flagstat to see if the number of reads matched the entries in the fastq.