Question

Different total number of sequences between the two PE fastq files

0

Entering edit mode

2.4 years ago

Sz • 0

Hi all,

Recently I'm trying to analyze RNA-Seq data from Paired-End NGS sequencing. At the start I've realized that 2 samples out of 9 have differences between Forward and Reverse fastq file.

For example:

XXXX_1.fq.gz - Total Sequences -> 50592434; Total Bases -> 7.5 Gbp
XXXX_2.fq.gz - Total Sequences -> 38414256; Total Bases -> 5.7 Gbp

Because of lack of data and knowing that these samples weren't of good quality during sequencing I can't throw away this samples.

Since this is SMARTer Stranded RNA-Seq kit I should cut three nucleotides before genome assembly. I've tried to use cutadapt in PE mode, but these 2 samples showed an error like this one:

Reads are improperly paired. Read name 'ST-E00144:1102:H7W5CCCX2:7:1101:7446:1520 1:N:0:NAGATCAT+NGATCTCG' in file 1 does not match 'ST-E00144:1102:H7W5CCCX2:8:1101:9110:1520 2:N:0:NAGATCAT+NGATCTCG' in file 2.

What should I do in this situation? I know that this will be exactly the same issue when I'll start genome assembly. That's why I need to figure out what to do with this one.

Should I run these two samples as Single-End reads (But what later when the rest of samples has PE mode?)? Maybe You know better solution for this kind of issue.

Thank You in advance for any kind of help!

NGS RNA-Seq Paired-End • 1.9k views

ADD COMMENT • link updated 2.4 years ago by GenoMax 154k • written 2.4 years ago by Sz • 0

score 1 · Answer 1 · 2023-05-16

1

Entering edit mode

2.4 years ago

GenoMax 154k

How did you end up with this situation? Did you trim the reads independently? You should always scan/trim paired-end data togehter.

You can use repair.sh from BBMap suite to "re-pair" the sequences by removing the singletons to a separate file.

Guide for repair.sh: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/repair-guide/

ADD COMMENT • link 2.4 years ago by GenoMax 154k

0

Entering edit mode

Someone (before me) did trimming of adapters from library and now I should trim first 3 nucleotides (this kind of library demands it) and I am doing this for paired-end together. Funny story, because only 2 samples don't match and the rest (7 samples) are good, so I think that this sequencing was made poorly (origin of poor plant material).

ADD REPLY • link 2.4 years ago by Sz • 0

1

Entering edit mode

because only 2 samples don't match

Non-matching read files should have nothing to do with sequencing. The sequencer will always produce identical number of reads for PE sequencing. It is likely that the data files were trimmed independently. Please check the remaining samples with repair.sh just to be safe.

ADD REPLY • link 2.4 years ago by GenoMax 154k

0

Entering edit mode

Ok, thank you very much for your help. I'll try to use repair.sh. Also thanks for suggestion about checking remaining samples, that's very helpful in this situation.

ADD REPLY • link 2.4 years ago by Sz • 0