Hello all,
Just wondering if someone could help me with a really stupid question.
In this tutorial for using the tool ALLHiC: https://github.com/tangerzhang/ALLHiC/wiki/ALLHiC:-scaffolding-an-auto-polyploid-sugarcane-genome#hi-c-sequencing-information
They give an overview of the Hi-C sequencing information:
Hi-C sequencing information
- No. of Hi-C libraries: 4
- Restriction enzyme sites: HindIII
- Data Size: ~300 Gb
- Unique mapped ratio: 11.90%
- Validate rate: 88.80%
- Dangling End Rate: 11.38%
They then go on to describe how it is processed in their tool's pipeline:
bwa index -a bwtsw draft.asm.fasta
samtools faidx draft.asm.fasta
bwa aln -t 24 draft.asm.fasta reads_R1.fastq.gz > sample_R1.sai
bwa aln -t 24 draft.asm.fasta reads_R2.fastq.gz > sample_R2.sai
bwa sampe draft.asm.fasta sample_R1.sai sample_R2.sai reads_R1.fastq.gz reads_R2.fastq.gz > sample.bwa_aln.sam
PreprocessSAMs.pl sample.bwa_aln.sam draft.asm.fasta MBOI
filterBAM_forHiC.pl sample.bwa_aln.REduced.paired_only.bam sample.clean.sam
samtools view -bt draft.asm.fasta.fai sample.clean.sam > sample.clean.bam
bwa index -a bwtsw draft.asm.fasta
samtools faidx draft.asm.fasta
The part that confuses me is that they mention 4 libraries, but only one sample in the code. Did they likely concatenate all the forward and reverse reads into two merged_r1.fastq.gz and merged_r2.fastq.gz files?
Thanks!