Question: Which indication of a failure can be found in bwa-mem or samtools' output?
0
gravatar for serpalma.v
10 weeks ago by
serpalma.v20
Germany
serpalma.v20 wrote:

Hello

I aligned 108 FASTQ pairs with bwa-mem in one go, after alignment I transformed the SAMs into BAMs with samtools. I wrote the script so that I know when a pair starts and ends.

The job was completed, because I found 108 BAMs in the output directory and the script indicates that all pairs were processed.

I want to rule out the possibility that an error occured by searching for a message that could indicate that something went wrong. I am not asking for the exact message, but some string that is common and exclusive to all error messages, either for bwa-mem or samtools.

Thanks in advance

bwa samtools • 201 views
ADD COMMENTlink modified 10 weeks ago by ATpoint8.2k • written 10 weeks ago by serpalma.v20

Did you redirect the output from your script to a log file?

ADD REPLYlink written 10 weeks ago by Sej Modha3.8k

I did this nohup script.sh &> script.out &

ADD REPLYlink written 10 weeks ago by serpalma.v20

Could you post a few lines from script.out?

ADD REPLYlink written 10 weeks ago by Sej Modha3.8k

Beginning and end for the first pair:

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 2210544 sequences (320000246 bp)...
[M::process] read 2207480 sequences (320000211 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (82, 871436, 27, 24)
[M::mem_pestat] analyzing insert size distribution for orientation FF...

[...many lines...]

[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 982502 reads in 754.311 CPU sec, 24.037 real sec
[main] Version: 0.7.16a-r1181
[main] CMD: bwa mem -t 32 -M path/to/ref/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz path/to/fastq/R1.fastq.gz path/to/fastq/R2.fastq.gz
[main] Real time: 848.251 sec; CPU: 26138.097 sec
ADD REPLYlink written 10 weeks ago by serpalma.v20

By the way, for the future: You can prevent BWA from being so talky by simply modifying the BWA source code (bwamem_pair.c), putting if (bwa_verbose >= 3) in front of all lines where fprintf triggers these messages.

E.g. line 84:

fprintf(stderr, "[M::%s] (25, 50, 75) percentile: (%d, %d, %d)\n", __func__, p25, p50, p75);

becomes:

if (bwa_verbose >= 3) fprintf(stderr, "[M::%s] (25, 50, 75) percentile: (%d, %d, %d)\n", __func__, p25, p50, p75);

Using bwa mem -v3 will then limit the STDERR messages to a necessary minimum. If you are interested, on my Git there is a version of bwamem_pair.c with these modifications. You'll need to recompile BWA of course.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by ATpoint8.2k

I want to rule out the possibility that an error occured by searching for a message that could indicate that something went wrong

test bwa returned 0. https://bash.cyberciti.biz/guide/The_exit_status_of_a_command

ADD REPLYlink written 10 weeks ago by Pierre Lindenbaum113k

Thanks, If I understood correctly, I will have to re-run the script including the exit status after bwa and samtools, did I get it right?

ADD REPLYlink written 10 weeks ago by serpalma.v20

I would grep for obvious words in the log file (you already have) that can indicate problems (e.g. killed, error you get the idea). If the job took an expected length of time (108 samples should have taken some time) and checks noted by @ATPoint below go through then all should be well.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by genomax57k
0
gravatar for ATpoint
10 weeks ago by
ATpoint8.2k
Germany
ATpoint8.2k wrote:

I would do the following:

Check file sizes

ls -lh *.bam

Even though file sizes are not too informative due to compression level differences, if a BAM is only a few KB in size, something is obviously wrong.

Check if all BAM files are valid

samtools quickcheck -qvvv *.bam > bad_BAM.txt && echo 'all ok' || echo 'some files failed check, see bad_BAM.txt'

If some BAM files have obvious anomalities, they file name will be printed to bad_BAM.txt.

Check the log file

grep 'M::mem_process_seqs' script.out

This message is always printed to STDERR at the end of the alignment, stating how many reads have been processed. You can compare that to the number of reads in your fastq files. If the numbers are identical, it should be fine. Alternatively, you can use samtools flagstat to see if the number of reads matched the entries in the fastq.

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by ATpoint8.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1049 users visited in the last hour