Question

Query about ALLHiC tutorial data format

0

Entering edit mode

23 months ago

mbeavitt • 0

Hello all,

Just wondering if someone could help me with a really stupid question.

In this tutorial for using the tool ALLHiC: https://github.com/tangerzhang/ALLHiC/wiki/ALLHiC:-scaffolding-an-auto-polyploid-sugarcane-genome#hi-c-sequencing-information

They give an overview of the Hi-C sequencing information:

Hi-C sequencing information

No. of Hi-C libraries: 4
Restriction enzyme sites: HindIII
Data Size: ~300 Gb
Unique mapped ratio: 11.90%
Validate rate: 88.80%
Dangling End Rate: 11.38%

They then go on to describe how it is processed in their tool's pipeline:

bwa index -a bwtsw draft.asm.fasta  
samtools faidx draft.asm.fasta  

bwa aln -t 24 draft.asm.fasta reads_R1.fastq.gz > sample_R1.sai  
bwa aln -t 24 draft.asm.fasta reads_R2.fastq.gz > sample_R2.sai  
bwa sampe draft.asm.fasta sample_R1.sai sample_R2.sai reads_R1.fastq.gz reads_R2.fastq.gz > sample.bwa_aln.sam  

PreprocessSAMs.pl sample.bwa_aln.sam draft.asm.fasta MBOI
filterBAM_forHiC.pl sample.bwa_aln.REduced.paired_only.bam sample.clean.sam  
samtools view -bt draft.asm.fasta.fai sample.clean.sam > sample.clean.bam  

bwa index -a bwtsw draft.asm.fasta  
samtools faidx draft.asm.fasta

The part that confuses me is that they mention 4 libraries, but only one sample in the code. Did they likely concatenate all the forward and reverse reads into two merged_r1.fastq.gz and merged_r2.fastq.gz files?

Thanks!

allhic paired-end • 541 views

ADD COMMENT • link 23 months ago by mbeavitt • 0