Question

Merging sam files for HiC analysis downstream

1

Entering edit mode

6.4 years ago

ashwinkelkar ▴ 10

Hello everyone I am using a dataset for HiC analysis. The dataset is broken into 41 individual paired ended fastq files as the raw data. I want to merge the 41 (x2) individual sam files after alignment but I want to do so without sorting the reads and maintaining the header intact. Since this is HiC, the sequencing data is paired ended and I need to align each pair (_1 and _2) individually to the genome because the other end of the pair can be located anywhere else. I am using the tool HiCexplorer to generate the contact matrices. There is a mandatory requirement of using -reorder option for bowtie2 in this case.

If I use samtools merge, it will sort the input by default and this destroys the pair information (maintained in the order of reads in the input file in this case). Any other ways to achieve the same ?

I could merge the fastq files before alignment by merging all the _1 files and the _2 respectively. If I use the cat command that would presumably maintain the reads in the same order in both the _1 and _2 files individually and then proceed with the analysis. But I just want to find out if this can be achieved after alignment when I am sure that the -reorder option actually has been enforced while alignment equally on both ends of every pair of reads. Thanks.

HiC sam align • 2.2k views

ADD COMMENT • link 6.4 years ago by ashwinkelkar ▴ 10

score 1 · Answer 1 · 2017-11-13

1

Entering edit mode

6.4 years ago

Sean Davis 26k

First, a comment. You might consider a tool specifically designed for HiC data that does incremental or split-read alignments (based on cut site). See here, for example: https://omictools.com/read-alignment-1-category

To answer your question, though, merging SAM reads can also be done using cat. You'll need to be creative (could even be done in a text editor) about how to combine the headers, but the read order can be preserved.

ADD COMMENT • link 6.4 years ago by Sean Davis 26k

0

Entering edit mode

Hello Sean

Thanks for the prompt reply. I will go through these and try the one that is most appropriate. I chose HiCexplorer because the same lab that has put out this tool has also done the experiment for the dataset that I want to use. Thanks, Ashwin

ADD REPLY • link 6.4 years ago by ashwinkelkar ▴ 10