Question: Merging sam files for HiC analysis downstream
1
gravatar for ashwinkelkar
2.0 years ago by
ashwinkelkar10
IISER Pune India
ashwinkelkar10 wrote:

Hello everyone I am using a dataset for HiC analysis. The dataset is broken into 41 individual paired ended fastq files as the raw data. I want to merge the 41 (x2) individual sam files after alignment but I want to do so without sorting the reads and maintaining the header intact. Since this is HiC, the sequencing data is paired ended and I need to align each pair (_1 and _2) individually to the genome because the other end of the pair can be located anywhere else. I am using the tool HiCexplorer to generate the contact matrices. There is a mandatory requirement of using -reorder option for bowtie2 in this case.

If I use samtools merge, it will sort the input by default and this destroys the pair information (maintained in the order of reads in the input file in this case). Any other ways to achieve the same ?

I could merge the fastq files before alignment by merging all the _1 files and the _2 respectively. If I use the cat command that would presumably maintain the reads in the same order in both the _1 and _2 files individually and then proceed with the analysis. But I just want to find out if this can be achieved after alignment when I am sure that the -reorder option actually has been enforced while alignment equally on both ends of every pair of reads. Thanks.

sam align hic • 881 views
ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by ashwinkelkar10
1
gravatar for Sean Davis
2.0 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

First, a comment. You might consider a tool specifically designed for HiC data that does incremental or split-read alignments (based on cut site). See here, for example: https://omictools.com/read-alignment-1-category

To answer your question, though, merging SAM reads can also be done using cat. You'll need to be creative (could even be done in a text editor) about how to combine the headers, but the read order can be preserved.

ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by Sean Davis25k

Hello Sean

Thanks for the prompt reply. I will go through these and try the one that is most appropriate. I chose HiCexplorer because the same lab that has put out this tool has also done the experiment for the dataset that I want to use. Thanks, Ashwin

ADD REPLYlink written 2.0 years ago by ashwinkelkar10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1966 users visited in the last hour