Question: BWA not storing sequence
0
gravatar for tanya.copley
2.9 years ago by
tanya.copley10
tanya.copley10 wrote:

Hi, I am having an issue with BWA genome alignment where sequence information is not being stored (i.e. sequence output in bam file is a * ). Can anyone tell me why this is happening and how to retrieve my sequences?? I need them for doing downstream SNP analysis. My code for alignment, sam to bam and sorting: bwa mem -t $cpu -r 1 -a -M -R $rg $genome $l | \ sambamba view -t $cpu -f bam -h -S /dev/stdin | \ sambamba sort -t $cpu -o $aligned/$flow/$lane/${nameNoExt}_RG_sorted.bam /dev/stdin

where $rg is my read group tag, $genome is the path to my genome file and $l is the loop to my input files

Here is a portion of the output when using: samtools view /path/to/file.bam | head -n 20 The first two have their sequences, while the remaining three output sequences have stars as their sequences. How can I tell bwa to keep these sequences?

HWI-ST521:68:C07EDACXX:2:2106:17357:162867  0   Gm01    30991   60  93M *   0   0   CAGCGTGAAGGGAACGAGTATTATTATTTATAACCAGCCACGTCATCAAGAGAAGAATATGGAAGGTGGATCGGGCCTTGCTATCCACACCCC   FFFHFHHDHIIGJIJIJIEFGIJJJIIJJCIHGIGIIJJIJIGHGGFGIECHIIIIEHHHHEDFF<;@CACD>BBBBBC>ACDDCC>?><<@5   NM:i:1  MD:Z:45G47  AS:i:88 XS:i:0  RG:Z:PI438460
HWI-ST521:68:C07EDACXX:2:1103:2382:91229    0   Gm01    33652   60  49M *   0   0   CAGCAATTCTGATAACAGTACCATTTTTTTTTTTGGAAGGCAAAATTAG   FFFHHHGHEGIBHIJIJJHIIGGIIJJJJJJJIGBHA?EC2?DECA;;;   NM:i:0  MD:Z:49 AS:i:49 XS:i:26 RG:Z:PI438460
HWI-ST521:68:C07EDACXX:2:2106:5869:22021    256 Gm01    33999   0   60M34H  *   0   0   *   *   NM:i:1  MD:Z:45T14  AS:i:55 RG:Z:PI438460
HWI-ST521:68:C07EDACXX:2:1101:3451:56376    256 Gm01    33999   0   59M35H  *   0   0   *   *   NM:i:2  MD:Z:45T5A7 AS:i:49 RG:Z:PI438460
HWI-ST521:68:C07EDACXX:2:2106:19619:31878   256 Gm01    33999   0   59M35H  *   0   0   *   *   NM:i:2  MD:Z:45T5A7 AS:i:49 RG:Z:PI438460
snp bwa not storing sequence • 1.3k views
ADD COMMENTlink modified 2.9 years ago by Santosh Anand4.9k • written 2.9 years ago by tanya.copley10

Hello tanya.copley!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?p=200869#post200869

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 2.9 years ago by WouterDeCoster41k
1
gravatar for piet
2.9 years ago by
piet1.7k
planet earth
piet1.7k wrote:

You have sorted your BAM file. Therefore, hard-clipped reads are appearing before soft-clipped alignments of the same read. See also this thread: bwa mem hard clipping

ADD COMMENTlink written 2.9 years ago by piet1.7k
0
gravatar for Santosh Anand
2.9 years ago by
Santosh Anand4.9k
Santosh Anand4.9k wrote:

The flag field of all the 3 reads of missing query sequences is 256, which correspond to not primary alignment. BWA will usually give the Query sequences only for primary alignments to save space and avoid unnecessary duplicated information. Since you have sorted the bam file, the primary alignments are no longer the first alignments in the list (see piet's answer and link for more detail). Do a grep on one (any) of the reads to see all its alignment, and in one of them the full sequence should be there.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Santosh Anand4.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1643 users visited in the last hour