Question: Merging sam files for HiC analysis downstream
gravatar for ashwinkelkar
2.9 years ago by
IISER Pune India
ashwinkelkar10 wrote:

Hello everyone I am using a dataset for HiC analysis. The dataset is broken into 41 individual paired ended fastq files as the raw data. I want to merge the 41 (x2) individual sam files after alignment but I want to do so without sorting the reads and maintaining the header intact. Since this is HiC, the sequencing data is paired ended and I need to align each pair (_1 and _2) individually to the genome because the other end of the pair can be located anywhere else. I am using the tool HiCexplorer to generate the contact matrices. There is a mandatory requirement of using -reorder option for bowtie2 in this case.

If I use samtools merge, it will sort the input by default and this destroys the pair information (maintained in the order of reads in the input file in this case). Any other ways to achieve the same ?

I could merge the fastq files before alignment by merging all the _1 files and the _2 respectively. If I use the cat command that would presumably maintain the reads in the same order in both the _1 and _2 files individually and then proceed with the analysis. But I just want to find out if this can be achieved after alignment when I am sure that the -reorder option actually has been enforced while alignment equally on both ends of every pair of reads. Thanks.

sam align hic • 1.2k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by ashwinkelkar10
gravatar for Sean Davis
2.9 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

First, a comment. You might consider a tool specifically designed for HiC data that does incremental or split-read alignments (based on cut site). See here, for example:

To answer your question, though, merging SAM reads can also be done using cat. You'll need to be creative (could even be done in a text editor) about how to combine the headers, but the read order can be preserved.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Sean Davis26k

Hello Sean

Thanks for the prompt reply. I will go through these and try the one that is most appropriate. I chose HiCexplorer because the same lab that has put out this tool has also done the experiment for the dataset that I want to use. Thanks, Ashwin

ADD REPLYlink written 2.9 years ago by ashwinkelkar10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1792 users visited in the last hour