Question: Paired Reads have different names - but names are vastly different
0
gravatar for ahlawrence
4 months ago by
ahlawrence0
ahlawrence0 wrote:

Hello,

I am trying to use bwa to align paired-end reads to a reference genome. It outputs 10GB of alignment and then stops, saying '[mem_sam_pe] paired reads have different names: "SEQCORE-1795804:227:H2LK3ADXX:1:2101:11817:76611", "SEQCORE-1795804:227:H2LK3ADXX:1:2103:8082:76846"' I've found fixes for if paired names differ by one character, but I am unsure if these reads are even supposed to be paired as I am quite new to this. I've tried using bbmap to repair the files but haven't had any success Any ideas as to what I should do?

bwa paired-reads alignment • 148 views
ADD COMMENTlink written 4 months ago by ahlawrence0

I've tried using bbmap to repair the files but haven't had any success Any ideas as to what I should do?

You have tried the correct solution. So you have no other information about where the files come from? As it stands we can't tell if these reads should be paired since they are missing the critical bit that follows the fastq headers, which tell us if the reads are from Read 1 or 2 (1:N:0:CTTCCT or 2:N:0:CTTCCT).

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax80k

Thank you for your response. Here are the full headers from the raw file: @SEQCORE-1795804:227:H2LK3ADXX:1:1101:1140:2089 1:N:0: from the R1 and @SEQCORE-1795804:227:H2LK3ADXX:1:2213:13633:47457 2:N:0 from the R2. So they definitely are paired.

ADD REPLYlink written 4 months ago by ahlawrence0

So running repair.sh completes without any changes and tells you reads are properly paired?

ADD REPLYlink written 4 months ago by genomax80k

I think I solved the issue. I was concatenating my raw-reads from lane 1 and lane 2 together. So had two files corresponding to R1 and R2 but each contained lane 1 and lane 2. Bwa was stopping right at the transition from lane 1 to lane 2. I am now running the lanes separately (so four files L1_R1, L1_R2 , L2_R1, L2_R2) and it is working.

ADD REPLYlink written 4 months ago by ahlawrence0

Can you tell us what OS/flavor of unix you are running the cat on? What version of bwa are you using?

This issue is supposed to affect some bioinformatics programs but did not know bwa was affected (see http://seqanswers.com/forums/showthread.php?t=51395 post #4).

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax80k

I am running it on RHEL version 7.7. I am using bwa version 0.7.17.

The cat command I used and got errors was (when both lanes were grouped together):

zcat *R2*| gzip -> REM.R2.fastq.gz
zcat *R1*| gzip -> REM.R1.fastq.gz

The cat command I used and was successful was (when I ran bwa separately for both lanes):

zcat REM_L001_R1_*.fastq.gz | gzip -> REM.L1.R1.fastq.gz
zcat REM_L001_R2_*.fastq.gz | gzip -> REM.L1.R2.fastq.gz

zcat REM_L002_R1_*.fastq.gz | gzip -> REM.L2.R1.fastq.gz
zcat REM_L002_R2_*.fastq.gz | gzip -> REM.L2.R2.fastq.gz
ADD REPLYlink modified 4 months ago • written 4 months ago by ahlawrence0
2

You should have just done a plain cat and it would have worked fine. bwa can handle those files. Instead of using * you should be explicit in the order of files when you use that command cat File1_R1 File2_R1 File3_R1 > total_R1.

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax80k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1093 users visited in the last hour