Question

Purpose of concatenation

0

Entering edit mode

4.5 years ago

roshan.vaid • 0

What is the purpose of concatenating paired end reads? how does it influence downstream analysis like mapping? Do you use concatenated files as input for mapping?

sequencing ChIP-Seq • 956 views

ADD COMMENT • link updated 4.5 years ago by Brice Sarver ★ 3.8k • written 4.5 years ago by roshan.vaid • 0

score 5 · Answer 1 · 2019-10-11

The answer to this question will depend on what you mean by 'concatenating' paired-end reads.

FASTQ files that have been split can be combined by concatenating them, e.g., cat fq1_R1_001.fq fq2_R1_002.fq > all.fq. These can also come from the same sample, though you may be better off analyzing them separately in some cases (e.g., same sample, different sequencing runs). Lots of advice on Biostars with respect to what to do in these situation.
Paired-end FASTQs can be combined into a single, interleaved FASTQ where both reads are present in a single file. You may need to specify a flag or pass these in a particular fashion when mapping.
Completely overlapping reads can be merged into a single read with various options of how to handle discordance. Recall that you're sequencing an individual molecule, so this turns two PE reads into one SE read.
I can't think of any reason (others, let me know if there is a good one) why you'd combine a forward and reverse read that don't overlap, unless you were trying to write your own base composition analysis script or something. Even then...

Each read/read pair is mapped against the reference you've provided. BAMs can even be merged after initial mapping, assuming the same sample and the same reference. No real reason to not give a mapper all the reads at one time unless, say, you're dealing with a huge number of reads and running into memory issues.