Paired-end reads: Sequence length is different between forward and reverse read
2
0
Entering edit mode
7.8 years ago
Ric ▴ 430

Hi, FastUniq has failed with an error with one of my dataset. I checked the files and discovered that the sequence length is different between forward and reverse read. Here is an example:

  MGRF_NGS_FATIMA_LIFAT-30373344/F-35905947> head L001_R1.fastq L001_R2.fastq
  ==> L001_R1.fastq <==
  @NS500334:63:HF2WTBGXY:1:11101:15449:1054 1:N:0:GATCAG
  TAAGTNAAACCCAAACGAAATTACCNTACCTTGNCCTAGCANGTCGATAAAAGGTGGATGGCATTGTAGGGTCGCTCTCTTCGNTTCGNNNTCGAANNNNNGNNNNNNNTNNNNNNANCNNNNNC
  +
  AAAAA#EEEEEEEE6EEEEEEEEEE#EEEEEEE#EEEEEAE#EEEEEAEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEA#EEEE###EEE<E#####E#######E######/#E#####A
  @NS500334:63:HF2WTBGXY:1:11101:10110:1054 1:N:0:GATCAG
  CTACANATCATAATGAATACAACATNAGTTTAANGAAACAGNCACAAGTTTAAAAAAAACTGAAATAACTATAAAATAACATGNCCAANNNCACTANNNNNTNNNNNNNANNNNNNANGNNNNNCCNNNNNNNNNNNNNNNNNNNNNNNNN
  +
  AAAAA#EEEAEEAEEEEEEEAEE/E#EEEEEEE#E/EA6EE#AEAEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEAEE#EEA/###AEEE/#####E#######A######E#E#####E6#########################
  @NS500334:63:HF2WTBGXY:1:11101:20814:1054 1:N:0:GATCAG
  CATGCNATGAGAAGATTTCATTTGCNAGGGTCCNTGTTGAANTGGATGCTGCCTATCCACTTCCTGATGAATTGGAGATTGATNCCCCNNNTGGCTNNNNNCNNNNNNNANNNNNNTNCNNNN

  ==> L001_R2.fastq <==
  @NS500334:63:HF2WTBGXY:1:11101:15449:1054 2:N:0:GATCAG
  NTATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATNNNNATTNNNNAAANNNNACTGNNNNATCNNNNAAAANNGCGAGNNTATCCTGTCTTANNTTAGTANCCACACGCACTGGATAATTTATGAACAAT
  +
  #AAA#################################################EE####EEE####EEA####EEEE####EEA####EEEE##A<EEA##EEEEEE<EEEE/##EAEE/E#EAAE/AA/AEEEE/EEEEEEEEEEE<AAE
  @NS500334:63:HF2WTBGXY:1:11101:10110:1054 2:N:0:GATCAG
  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
  +
  ###################################
  @NS500334:63:HF2WTBGXY:1:11101:20814:1054 2:N:0:GATCAG
  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Should forward and reverse read have the same length? What would be the best way to fix it?

Thank you in advance.

Mic

next-gen sequencing sequence • 4.7k views
ADD COMMENT
0
Entering edit mode
7.8 years ago
GenoMax 142k

They need not necessarily be the same length especially if they have been trimmed (or masked, as appears to be in this case, due to presence of NNNN).

There should be proper pairs in both files. If you suspect that the pairing is broken then it can be fixed by using repair.sh from BBMap suite.

ADD COMMENT
0
Entering edit mode

This reads have not been trimmed. I always thought that forward and reverse read must have the same length.

ADD REPLY
0
Entering edit mode

Not necessarily. There is no requirement that a run be setup in a symmetric fashion. Cycle lengths can be set up in any arbitrary combination during run set up.

ADD REPLY
0
Entering edit mode
7.8 years ago
chen ★ 2.5k

This situation is very common.

Read will be incomplete if quality is so low to complete the rest cycles for this spot. So you find a lot of N.

Software should handle this situation.

ADD COMMENT
0
Entering edit mode

But why forward and reverse read have not the same length?

ADD REPLY
0
Entering edit mode

What kind of mechanism (besides clipping) would guarantee the same length?

ADD REPLY

Login before adding your answer.

Traffic: 2446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6