Question

How to merge quality trimmed interlaced reads?

1

Entering edit mode

6.1 years ago

O.rka ▴ 710

I have paired end reads from Illumina NextSeq. I've removed adapters/primers, quality trimmed, and interlaced the forward and reverse reads. I now have a single fasta (not fastq) file with the following structure:

>NS500647:208:HYKFFBGX2:1:11101:9580:1154 1:N:0:GATCAG
GTGGTCAGCAGACGTTTAGCTTCGTCAACCAGGTCAGCTTCGTACAGGGATTTACAGATTTCAGGG
>NS500647:208:HYKFFBGX2:1:11101:9580:1154 2:N:0:GATCAG
ACTGGNNNTTCTGGAAAANNNNNGNNTCANC
>NS500647:208:HYKFFBGX2:1:11101:24675:1156 1:N:0:GATCAG
AGCCGATATTCACTACCTGCTCGCCTTTAACGTTCGCAATGTCTTTTAGTTGCGGCACCGCATTAACCAGA
>NS500647:208:HYKFFBGX2:1:11101:24675:1156 2:N:0:GATCAG
CGCACNNNATATGGGTTTTANTGGNGCAGCAT

Is it possible to merge the R1 and R2 reads into a single fragment? Or is that not how the paired end reads work?

RNA-Seq • 3.3k views

ADD COMMENT • link updated 5.4 years ago by Biostar 20 • written 6.1 years ago by O.rka ▴ 710

3

Entering edit mode

You don't want to merge them together. Paired end reads are the sequences at the both ends of a DNA fragments. When you have a 400 bp DNA fragment, your reads cover only 75 or 150 bases (NextSeq) at the end of the DNA. So if you merge PE reads into one, you will lose internal sequences. Instead, you can map the reads onto reference genomes or transcriptomes using various tools such as HISAT2, bwa, bowtie2, and etc.

ADD REPLY • link 6.1 years ago by mbk0asis ▴ 680

0

Entering edit mode

Thank you. I think I may have had some fundamental information incorrect about paired end sequencing technology. I was under the impression that you have overlapped segments so if you had 75 bp reads you would end up with ~150 bp reads. It seems like that is not the case. So for paired end reads, do mappers count each read of the pair individually?

ADD REPLY • link 6.1 years ago by O.rka ▴ 710

2

Entering edit mode

It is possible for paired-end reads to overlap in the middle (if length of sequencing is longer than the size of the fragment being sequenced, e.g. 16S amplicons are designed like that) but that will not always be the case. You can easily find out if reads in your case are able to merge into a longer representation by using bbmerge.sh from link given by @Charles.

Counting programs need to be told that you have paired-end reads (so they are not double counted) since each paired-end reads come from one fragment of DNA (mappers don't do the counting but will consider paired-ness of reads when aligning them).

ADD REPLY • link 6.1 years ago by GenoMax 141k

0

Entering edit mode

you dont map them individually - using the tools suggest above by mbk0asis you can map them as paired reads meaning the tool will know they are paired and take that into account when mapping.

ADD REPLY • link 5.4 years ago by Kristoffer Vitting-Seerup ★ 4.0k

1

Entering edit mode

Have you looked at the software listed in the post "Tools to merge overlapping paired-end reads"?

ADD REPLY • link 6.1 years ago by Charles Plessy ★ 2.9k