Different total number of sequences between the two PE fastq files
1
0
Entering edit mode
2.4 years ago
Sz • 0

Hi all,

Recently I'm trying to analyze RNA-Seq data from Paired-End NGS sequencing. At the start I've realized that 2 samples out of 9 have differences between Forward and Reverse fastq file.

For example:

  • XXXX_1.fq.gz - Total Sequences -> 50592434; Total Bases -> 7.5 Gbp
  • XXXX_2.fq.gz - Total Sequences -> 38414256; Total Bases -> 5.7 Gbp

Because of lack of data and knowing that these samples weren't of good quality during sequencing I can't throw away this samples.

Since this is SMARTer Stranded RNA-Seq kit I should cut three nucleotides before genome assembly. I've tried to use cutadapt in PE mode, but these 2 samples showed an error like this one:

Reads are improperly paired. Read name 'ST-E00144:1102:H7W5CCCX2:7:1101:7446:1520 1:N:0:NAGATCAT+NGATCTCG' in file 1 does not match 'ST-E00144:1102:H7W5CCCX2:8:1101:9110:1520 2:N:0:NAGATCAT+NGATCTCG' in file 2.

What should I do in this situation? I know that this will be exactly the same issue when I'll start genome assembly. That's why I need to figure out what to do with this one.

Should I run these two samples as Single-End reads (But what later when the rest of samples has PE mode?)? Maybe You know better solution for this kind of issue.

Thank You in advance for any kind of help!

NGS RNA-Seq Paired-End • 1.9k views
ADD COMMENT
1
Entering edit mode
2.4 years ago
GenoMax 154k

How did you end up with this situation? Did you trim the reads independently? You should always scan/trim paired-end data togehter.

You can use repair.sh from BBMap suite to "re-pair" the sequences by removing the singletons to a separate file.

Guide for repair.sh: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/repair-guide/

ADD COMMENT
0
Entering edit mode

Someone (before me) did trimming of adapters from library and now I should trim first 3 nucleotides (this kind of library demands it) and I am doing this for paired-end together. Funny story, because only 2 samples don't match and the rest (7 samples) are good, so I think that this sequencing was made poorly (origin of poor plant material).

ADD REPLY
1
Entering edit mode

because only 2 samples don't match

Non-matching read files should have nothing to do with sequencing. The sequencer will always produce identical number of reads for PE sequencing. It is likely that the data files were trimmed independently. Please check the remaining samples with repair.sh just to be safe.

ADD REPLY
0
Entering edit mode

Ok, thank you very much for your help. I'll try to use repair.sh. Also thanks for suggestion about checking remaining samples, that's very helpful in this situation.

ADD REPLY

Login before adding your answer.

Traffic: 2429 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6