Errors in sam file after bwa SAMPE alignment.
1
0
Entering edit mode
8.2 years ago
kirannbishwa01 ★ 1.6k

I aligned my fasta files to the reference genome using bwa sampe.

I have paired end reads so first used:

bwa aln -t 2 -l 20 lyrata_genome.fa Trm_MA605_R1.fastq > Trm_MA605_R1.sai
bwa aln -t 2 -l 20 lyrata_genome.fa Trm_MA605_R2.fastq > Trm_MA605_R2.sai

Then I did the sampe alignment using:

bwa sampe \
  -n 1 \
  -N 2 \
  lyrata_genome.fa \
  Trm_MA605_R1.sai \
  Trm_MA605_R2.sai \
  Trm_MA605_R1.fastq \
  Trm_MA605_R2.fastq > SAMPEaligned_MA605readsBWA.sam

I tried to sort the alignment and convert it to bam using picard. But, getting the error message (part of the total error message):

Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing text SAM file. MAPQ should be 0 for unmapped read.; File SAMPEaligned_MA605readsBWA.sam; Line 673702
Line: HWI-ST588:81:C080WACXX:7:1102:6980:187157    69    4    23328260    6    84M    6    963719    0    GGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGCATAAA    +2AAEEEIIFFIFIEI<CEDCEDDDDDIIIBDDEIDDBDDEIIDCCDIIII=ADDDDD?@DAAAAA>?AAAA>??AAA35(54>    RG:Z:MA605_C080WACXX_7    XT:A:U    NM:i:3    SM:i:6    AM:i:6    X0:i:1    X1:i:55    XM:i:3    XO:i:0    XG:i:0    MD:Z:28C49A1A3
    at htsjdk.samtools.SAMLineParser.reportErrorParsingLine(SAMLineParser.java:439)
    at htsjdk.samtools.SAMLineParser.parseLine(SAMLineParser.java:341)
    at htsjdk.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:248)

I tried to validate the sam file using picard again:

java -jar /home/everestial007/picard-tools-1.139/picard.jar ValidateSamFile \
  I=SAMPEaligned_MA605readsBWA.sam \
  Mode=VERBOSE \
  MAX_OUTPUT=15000 \
  O=validatebamMA605forGATK

But I'm getting the following error message in the output file.

ERROR: Record 46, Read name HWI-ST588:81:C080WACXX:7:1101:3935:2321, Mate negative strand flag does not match read negative strand flag of mate
ERROR: Record 155, Read name HWI-ST588:81:C080WACXX:7:1101:9306:2386, Mate negative strand flag does not match read negative strand flag of mate
ERROR: Record 352, Read name HWI-ST588:81:C080WACXX:7:1101:20820:2318, Mate negative strand flag does not match read negative strand flag of mate
ERROR: Record 382, Read name HWI-ST588:81:C080WACXX:7:1101:2880:2595, Mate negative strand flag does not match read negative strand flag of mate
ERROR: Record 562, Read name HWI-ST588:81:C080WACXX:7:1101:10717:2694, Mate negative strand flag does not match read negative strand flag of mate
ERROR: Record 603, Read name HWI-ST588:81:C080WACXX:7:1101:12062:2750, Mate negative strand flag does not match read negative strand flag of mate

I followed the bwa manual to letter but getting this kind of errors. I can not figure why is my alignment not working.

Thanks in advance.

bam sam bwa picard • 3.8k views
ADD COMMENT
1
Entering edit mode
8.2 years ago

Try adding VALIDATION_STRINGENCY=LENIENT to your picard commands. Picard takes a very strict view of what a proper SAM file should be (including enforcing rules that aren't in the specification).

ADD COMMENT
0
Entering edit mode

Thanks for the info. But, should not we be considering the error message. I don't understand what "Mate negative strand flag does not match read negative strand flag of mate" means, and if I should be concerned that my alignments are working properly or not. Btw, sam flagstat reports that 86 % of the reads are aligned - so it is something cocerning.

Thanks,

ADD REPLY
0
Entering edit mode

It's possible that your mates are out of sync (i.e., the nth read in Trm_MA605_R1.fastq doesn't match the nth read in Trm_MA605_R2.fastq), so you might check that (you can use reformat from BBMap to do this).

ADD REPLY
0
Entering edit mode

Thanks for the info. I a little hesistant in trying to learn a new tool again (just takes too much of my time to set it up to getting it right), but will do it as a last resort. Do you think picard can do it? Also, if if have 86% alignment, should I be worried about it? - just wanting to weigh the pros and cons of the extra step, but I will appreciate any suggestions.

Thanks,

ADD REPLY
1
Entering edit mode

If the overwhelming majority are marked as "proper pairs" then I wouldn't bother with further processing, the results are likely fine.

ADD REPLY
1
Entering edit mode

Thanks for enlighting. Here is my samtools flagstat report,

22193026 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
19848288 + 0 mapped (89.43% : N/A)
22193026 + 0 paired in sequencing
11096513 + 0 read1
11096513 + 0 read2
18519235 + 0 properly paired (83.45% : N/A)
19450790 + 0 with itself and mate mapped
397498 + 0 singletons (1.79% : N/A)
366420 + 0 with mate mapped to a different chr
122636 + 0 with mate mapped to a different chr (mapQ>=5)

Since, 83.45% are properly paired so I think I am good to go, but just want to make sure. Also, I used picard's "CleanSam" to remove reads that mapped out of reference sequence.

ADD REPLY
1
Entering edit mode

That seems reasonable, you should be good to go.

ADD REPLY

Login before adding your answer.

Traffic: 1409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6