Question

Paired Reads have different names - but names are vastly different

0

Entering edit mode

4.4 years ago

ahlawrence • 0

Hello,

I am trying to use bwa to align paired-end reads to a reference genome. It outputs 10GB of alignment and then stops, saying '[mem_sam_pe] paired reads have different names: "SEQCORE-1795804:227:H2LK3ADXX:1:2101:11817:76611", "SEQCORE-1795804:227:H2LK3ADXX:1:2103:8082:76846"' I've found fixes for if paired names differ by one character, but I am unsure if these reads are even supposed to be paired as I am quite new to this. I've tried using bbmap to repair the files but haven't had any success Any ideas as to what I should do?

alignment bwa paired-reads • 2.1k views

ADD COMMENT • link 4.4 years ago by ahlawrence • 0

0

Entering edit mode

I've tried using bbmap to repair the files but haven't had any success Any ideas as to what I should do?

You have tried the correct solution. So you have no other information about where the files come from? As it stands we can't tell if these reads should be paired since they are missing the critical bit that follows the fastq headers, which tell us if the reads are from Read 1 or 2 (1:N:0:CTTCCT or 2:N:0:CTTCCT).

ADD REPLY • link 4.4 years ago by GenoMax 141k

0

Entering edit mode

Thank you for your response. Here are the full headers from the raw file: @SEQCORE-1795804:227:H2LK3ADXX:1:1101:1140:2089 1:N:0: from the R1 and @SEQCORE-1795804:227:H2LK3ADXX:1:2213:13633:47457 2:N:0 from the R2. So they definitely are paired.

ADD REPLY • link 4.4 years ago by ahlawrence • 0

0

Entering edit mode

So running repair.sh completes without any changes and tells you reads are properly paired?

ADD REPLY • link 4.4 years ago by GenoMax 141k

0

Entering edit mode

I think I solved the issue. I was concatenating my raw-reads from lane 1 and lane 2 together. So had two files corresponding to R1 and R2 but each contained lane 1 and lane 2. Bwa was stopping right at the transition from lane 1 to lane 2. I am now running the lanes separately (so four files L1_R1, L1_R2 , L2_R1, L2_R2) and it is working.

ADD REPLY • link 4.4 years ago by ahlawrence • 0

0

Entering edit mode

Can you tell us what OS/flavor of unix you are running the cat on? What version of bwa are you using?

This issue is supposed to affect some bioinformatics programs but did not know bwa was affected (see http://seqanswers.com/forums/showthread.php?t=51395 post #4).

ADD REPLY • link 4.4 years ago by GenoMax 141k

0

Entering edit mode

I am running it on RHEL version 7.7. I am using bwa version 0.7.17.

The cat command I used and got errors was (when both lanes were grouped together):

zcat *R2*| gzip -> REM.R2.fastq.gz
zcat *R1*| gzip -> REM.R1.fastq.gz

The cat command I used and was successful was (when I ran bwa separately for both lanes):

zcat REM_L001_R1_*.fastq.gz | gzip -> REM.L1.R1.fastq.gz
zcat REM_L001_R2_*.fastq.gz | gzip -> REM.L1.R2.fastq.gz

zcat REM_L002_R1_*.fastq.gz | gzip -> REM.L2.R1.fastq.gz
zcat REM_L002_R2_*.fastq.gz | gzip -> REM.L2.R2.fastq.gz

ADD REPLY • link 4.4 years ago by ahlawrence • 0

2

Entering edit mode

You should have just done a plain cat and it would have worked fine. bwa can handle those files. Instead of using * you should be explicit in the order of files when you use that command cat File1_R1 File2_R1 File3_R1 > total_R1.

ADD REPLY • link 4.4 years ago by GenoMax 141k