Question: bwa mem not aligning
0
gravatar for senowinski
2.5 years ago by
senowinski30
European Union
senowinski30 wrote:

I am running the following command:

bwa mem -t 4 /users/person/resources/reference/hg19/genome/ucsc.hg19.fasta 160095-T_S2_L003_R1_001.fastq.gz 160095-t_s2_l003_r2_001.fastq.gz > 160095-T_S2_L003_R1_001.fastq.gz.bwa.sam

With the following bwa messages where all four orientations are being skipped:-

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 546760 sequences (40000002 bp)...
[M::process] read 546534 sequences (40000140 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (1, 1, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[mem_sam_pe] paired reads have different names: "NS500784:187:HHYNGAFXX:4:11401:17815:1049", "NS500784:187:HHYNGAFXX:1:11101:13503:1043"

[mem_sam_pe] paired reads have different names: "NS500784:187:HHYNGAFXX:4:11401:8106:1064", "NS500784:187:HHYNGAFXX:1:11101:4077:1043"

When I open the sam file it looks like this:

@SQ SN:chrM LN:16571        
@SQ SN:chr1 LN:249250621        
@SQ SN:chr2 LN:243199373        
@SQ SN:chr3 LN:198022430        
@SQ SN:chr4 LN:191154276        
@SQ SN:chr5 LN:180915260        
@SQ SN:chr6 LN:171115067        
@SQ SN:chr7 LN:159138663        
@SQ SN:chr8 LN:146364022        
@SQ SN:chr9 LN:141213431        
@SQ SN:chr10    LN:135534747        
@SQ SN:chr11    LN:135006516        
@SQ SN:chr12    LN:133851895        
@SQ SN:chr13    LN:115169878        
@SQ SN:chr14    LN:107349540        
@SQ SN:chr15    LN:102531392        
@SQ SN:chr16    LN:90354753     
@SQ SN:chr17    LN:81195210     
@SQ SN:chr18    LN:78077248     
@SQ SN:chr19    LN:59128983     
@SQ SN:chr20    LN:63025520     
@SQ SN:chr21    LN:48129895     
@SQ SN:chr22    LN:51304566     
@SQ SN:chrX LN:155270560        
@SQ SN:chrY LN:59373566     
@SQ SN:chr1_gl000191_random LN:106433       
@SQ SN:chr1_gl000192_random LN:547496       
@SQ SN:chr4_ctg9_hap1   LN:590426       
@SQ SN:chr4_gl000193_random LN:189789       
@SQ SN:chr4_gl000194_random LN:191469       
@SQ SN:chr6_apd_hap1    LN:4622290      
@SQ SN:chr6_cox_hap2    LN:4795371      
@SQ SN:chr6_dbb_hap3    LN:4610396      
@SQ SN:chr6_mann_hap4   LN:4683263      
@SQ SN:chr6_mcf_hap5    LN:4833398      
@SQ SN:chr6_qbl_hap6    LN:4611984      
@SQ SN:chr6_ssto_hap7   LN:4928567      
@SQ SN:chr7_gl000195_random LN:182896       
@SQ SN:chr8_gl000196_random LN:38914        
@SQ SN:chr8_gl000197_random LN:37175        
@SQ SN:chr9_gl000198_random LN:90085        
@SQ SN:chr9_gl000199_random LN:169874       
@SQ SN:chr9_gl000200_random LN:187035       
@SQ SN:chr9_gl000201_random LN:36148        
@SQ SN:chr11_gl000202_random    LN:40103        
@SQ SN:chr17_ctg5_hap1  LN:1680828      
@SQ SN:chr17_gl000203_random    LN:37498        
@SQ SN:chr17_gl000204_random    LN:81310        
@SQ SN:chr17_gl000205_random    LN:174588       
@SQ SN:chr17_gl000206_random    LN:41001        
@SQ SN:chr18_gl000207_random    LN:4262     
@SQ SN:chr19_gl000208_random    LN:92689        
@SQ SN:chr19_gl000209_random    LN:159169       
@SQ SN:chr21_gl000210_random    LN:27682        
@SQ SN:chrUn_gl000211   LN:166566       
@SQ SN:chrUn_gl000212   LN:186858       
@SQ SN:chrUn_gl000213   LN:164239       
@SQ SN:chrUn_gl000214   LN:137718       
@SQ SN:chrUn_gl000215   LN:172545       
@SQ SN:chrUn_gl000216   LN:172294       
@SQ SN:chrUn_gl000217   LN:172149       
@SQ SN:chrUn_gl000218   LN:161147       
@SQ SN:chrUn_gl000219   LN:179198       
@SQ SN:chrUn_gl000220   LN:161802       
@SQ SN:chrUn_gl000221   LN:155397       
@SQ SN:chrUn_gl000222   LN:186861       
@SQ SN:chrUn_gl000223   LN:180455       
@SQ SN:chrUn_gl000224   LN:179693       
@SQ SN:chrUn_gl000225   LN:211173       
@SQ SN:chrUn_gl000226   LN:15008        
@SQ SN:chrUn_gl000227   LN:128374       
@SQ SN:chrUn_gl000228   LN:129120       
@SQ SN:chrUn_gl000229   LN:19913        
@SQ SN:chrUn_gl000230   LN:43691        
@SQ SN:chrUn_gl000231   LN:27386        
@SQ SN:chrUn_gl000232   LN:40652        
@SQ SN:chrUn_gl000233   LN:45941        
@SQ SN:chrUn_gl000234   LN:40531        
@SQ SN:chrUn_gl000235   LN:34474        
@SQ SN:chrUn_gl000236   LN:41934        
@SQ SN:chrUn_gl000237   LN:45867        
@SQ SN:chrUn_gl000238   LN:39939        
@SQ SN:chrUn_gl000239   LN:33824        
@SQ SN:chrUn_gl000240   LN:41933        
@SQ SN:chrUn_gl000241   LN:42152        
@SQ SN:chrUn_gl000242   LN:43523        
@SQ SN:chrUn_gl000243   LN:43341        
@SQ SN:chrUn_gl000244   LN:39929        
@SQ SN:chrUn_gl000245   LN:36651        
@SQ SN:chrUn_gl000246   LN:38154        
@SQ SN:chrUn_gl000247   LN:36422        
@SQ SN:chrUn_gl000248   LN:39786        
@SQ SN:chrUn_gl000249   LN:38502        
@PG ID:bwa  PN:bwa  VN:0.7.15-r1140 CL:bwa/users/person/resources/reference/hg19/genome/ucsc.hg19.fasta 160095-T_S2_L004_R1_001.fastq.gz 160095-t_s2_l003_r2_001.fastq.gz

BUT when I use samtools to convert it to a bam file it's empty!

Can anyone advise?

alignment • 1.7k views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by senowinski30

My guess is that your paired files are not properly paired. The names of your reads don't seem to match between your pairs.

ADD REPLYlink written 2.5 years ago by Benn7.7k

I get this error from read1 and 2 from lane 3 and 4 of this sample. all my other samples were fine. so I also thought something similar - so I tried all alignment combinations of read1 and 2 from the different lanes with the same problem - I also tried concatenation read1s from lane 3 and 4 and aligned this with read2s from lane 3 and 4.

is there a way to fix the different names of the reads?

ADD REPLYlink written 2.5 years ago by senowinski30

How to fix it? Find the right pairs.

ADD REPLYlink written 2.5 years ago by Benn7.7k

What does fastqc / multiqc look like? Any major flags? Is it possible that your forward / reverse reads are randomly sorted?

ADD REPLYlink written 2.5 years ago by andrew.j.skelton735.8k

fastqc was fine - no flags. this is a problem with read 1 and read 2 from lane 3 and 4 of this sample. I have 29, and this is the only one with this problem

ADD REPLYlink written 2.5 years ago by senowinski30
1
gravatar for Pierre Lindenbaum
2.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

It's not a problem with bwa but it's a problem with your pair of fastq. You're mapping two fastqs that come from different lanes (L004 and L003)

  • 160095-T_S2_L004_R1_001.fastq.gz should be mapped with 160095-T_S2_L004_R2_001.fastq.gz (R1 and R2)

  • 160095-t_s2_l003_r1_001.fastq.gz should be mapped with 160095-t_s2_l003_r2_001.fastq.gz (R1 and R2)

ADD COMMENTlink written 2.5 years ago by Pierre Lindenbaum122k

Hey, yeah, that was a typo - nothing was working so I tried different combinations of lanes just incase.

The above is from read1 and read2 from Lane 3.

I also tried:

read1 lane3 with read2 lane4

and I also concatenated R1 from both lane 3 and lane 4, and then tried to align with R2 from lane 3 and 4 with the same problem.

ADD REPLYlink written 2.5 years ago by senowinski30

Thank you for noticing I didn't properly go through my question! I have tried all possible combinations of read1 and 2 from lane 3 and 4 from this sample because it is not aligning. and all have the same outcome - skipped all 4 orientations. I have 28 samples that aligned well, but not the case for this one...

do you have any other suggestions as to why this would be? is it the case that it has just come from different samples? do you think there was a sequencing error? library? labelling? pair names? and are you able to advise on what to do next - how to fix this problem?

thanks

ADD REPLYlink written 2.5 years ago by senowinski30

there is something wrong with your fastqs. You should have the same read name in the R1 and R2 fastqs.

check for the same read names:

paste <(gunzip -c file.R1.fastq.gz | paste - - - - | cut -f 1)  paste <(gunzip -c file.R2.fastq.gz | paste - - - - | cut -f 1)  | grep -m 1 -F "NS500784:187:HHYNGAFXX:4:11401:17815:1049"

you should get two columns with the same read name R1/R2.

How to fix this ? I wouldn't trust this kind of data anyway. There is something wrong in your data.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Pierre Lindenbaum122k

When you notice data not aligning as expected the first thing you should check is to take a few reads and blast them at NCBI to make sure you are looking at the right genome/sample. It won't be the first time a sample has been contaminated (somewhere along the line) and the data is not correct to begin with.

Have you scanned and trimmed your data for presence of adapters?

If all that checks out then try repair.sh tool from BBMap suite that can fix read order issues in PE data files. You would use it like repair.sh in1=r1.fq.gz in2=r2.fq.gz out1=fixed1.fq.gz out2=fixed2.fq.gz outsingle=singletons.fq.gz

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by genomax71k

It appears that the names of the file do not match the data. From your example:

"NS500784:187:HHYNGAFXX:4:11401:17815:1049", "NS500784:187:HHYNGAFXX:1:11101:13503:1043"

The first read is from lane 4 (the number after the flow cell identifier HHYNGAFXX), while the second is from lane 1.

ADD REPLYlink written 2.5 years ago by harold.smith.tarheel4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2149 users visited in the last hour