Query about ALLHiC tutorial data format
0
0
Entering edit mode
23 months ago
mbeavitt • 0

Hello all,

Just wondering if someone could help me with a really stupid question.

In this tutorial for using the tool ALLHiC: https://github.com/tangerzhang/ALLHiC/wiki/ALLHiC:-scaffolding-an-auto-polyploid-sugarcane-genome#hi-c-sequencing-information

They give an overview of the Hi-C sequencing information:

Hi-C sequencing information

  • No. of Hi-C libraries: 4
  • Restriction enzyme sites: HindIII
  • Data Size: ~300 Gb
  • Unique mapped ratio: 11.90%
  • Validate rate: 88.80%
  • Dangling End Rate: 11.38%

They then go on to describe how it is processed in their tool's pipeline:

bwa index -a bwtsw draft.asm.fasta  
samtools faidx draft.asm.fasta  

bwa aln -t 24 draft.asm.fasta reads_R1.fastq.gz > sample_R1.sai  
bwa aln -t 24 draft.asm.fasta reads_R2.fastq.gz > sample_R2.sai  
bwa sampe draft.asm.fasta sample_R1.sai sample_R2.sai reads_R1.fastq.gz reads_R2.fastq.gz > sample.bwa_aln.sam  

PreprocessSAMs.pl sample.bwa_aln.sam draft.asm.fasta MBOI
filterBAM_forHiC.pl sample.bwa_aln.REduced.paired_only.bam sample.clean.sam  
samtools view -bt draft.asm.fasta.fai sample.clean.sam > sample.clean.bam  

bwa index -a bwtsw draft.asm.fasta  
samtools faidx draft.asm.fasta  

The part that confuses me is that they mention 4 libraries, but only one sample in the code. Did they likely concatenate all the forward and reverse reads into two merged_r1.fastq.gz and merged_r2.fastq.gz files?

Thanks!

allhic paired-end • 541 views
ADD COMMENT

Login before adding your answer.

Traffic: 2991 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6