Error in schicexplorer's hicbuildmatrix
1
0
Entering edit mode
5 months ago
1125195245 • 0

I use schicexplorer's hicbuildmatrix code; it complains that two sam files do not have the same reads order

enter image description here

hicbuildmatrix schic • 1.4k views
ADD COMMENT
0
Entering edit mode

It's bad practice to add the question to the title and only a screenshot as question. The error is clear. Did you try what it suggests, and if that does not work add details what you tried and how you processed the files. Please use edit and add a meaningful title.

ADD REPLY
0
Entering edit mode

I tried using bowtie2's --reorder command to output a sam file for each of the two files, and then used both sam files to build the matrix, but I still ended up getting an error saying that the two sam files don't have the same READ ORDER!

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you very much, but I still can't solve the problem. I don't know what the reason is. May I ask what code you use when using this software?

ADD REPLY
0
Entering edit mode

Please show all code that you used. Everything.

ADD REPLY
0
Entering edit mode

And not screenshots please. It will help streamline things if you copy and paste code and errors using the code sample box from the text formatting boxes or wrap your code in a triple backticks ```

ADD REPLY
0
Entering edit mode
(((fastq-dump --accession SRR5229019 --split-files --defline-seq '@$sn[_$rn]/$ri\' --defline-qual '+\'  --split-spot --stdout SRR5229019  > SRR5229019.fastq  #download data
scHicDemultiplex -f "SRR5229019.fastq" --srrToSampleFile samples.txt barcodeFile GSE94489_README.txt --threads 20 #demultiplexing
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 /newdisk/genome_index/GRCm38_mm10/mm10 Diploid_11_TACGCTGC_GAGCCTTA_R2.fastq.gz | samtools view -Shb - > Diploid_11_TACGCTGC_GAGCCTTA_R2.bam #comparison
bwa mem -A 1 -B 4 -E 50 -L 0 -t 8 /newdisk/genome_index/GRCm38_mm10/mm10 Diploid_11_TACGCTGC_GAGCCTTA_R1.fastq.gz | samtools view -Shb - > Diploid_11_TACGCTGC_GAGCCTTA_R1.bam #comparison
samtools index Diploid_11_TACGCTGC_GAGCCTTA_R1_sorted.bam  #Build indexes
samtools index Diploid_11_TACGCTGC_GAGCCTTA_R2_sorted.bam #Build indexes
samtools sort  -@ 10 -o Diploid_11_TACGCTGC_GAGCCTTA_R1_sorted.bam Diploid_11_TACGCTGC_GAGCCTTA_R1.bam #sort
samtools sort  -@ 10 -o Diploid_11_TACGCTGC_GAGCCTTA_R2_sorted.bam Diploid_11_TACGCTGC_GAGCCTTA_R2.bam #sort
hicBuildMatrix -s Diploid_11_TACGCTGC_GAGCCTTA_R1_sorted.bam Diploid_11_TACGCTGC_GAGCCTTA_R1_sorted.bam --binSize 1000000 --QCfolder  Diploid_11_TACGCTGC_GAGCCTTA_QC -o Diploid_11_TACGCTGC_GAGCCTTA.cool --threads 4 --restrictionCutFile ~/mm10_DpnII.txt --restrictionSequence GATC --danglingSequence GATC #Build mutual matrix)))

I keep getting errors when I run to the last step

ADD REPLY
0
Entering edit mode
4 months ago
jv ★ 1.8k

My best guess at the moment is that you need R1 and R2 bam files sorted by read name, not the default sort by leftmost coordinate. It's not uncommon for tools to require sort by read name when using paired-end data as input.

Add the -n flag to your samtools sort. Also you likely need to index the sorted bam files.

ADD COMMENT
0
Entering edit mode

looking at Bowtie2 --reorder guarantees that output SAM records are printed in an order corresponding to the order of the reads in the original input file. IIRC R1 and R2 fastq files are ordered by read name further supporting the above suggestion.

ADD REPLY
0
Entering edit mode

I added -n to samtools sort, but it still can't solve the problem. I found that the problem is that the file size of R1 and R2 needs to be the same in order to use hicbuildmatrix. Do you have any good solutions

ADD REPLY
0
Entering edit mode

I found that the problem is that the file size of R1 and R2 needs to be the same

Don't go on file sizes. They are a poor metric for anything. The order and number of reads should be the same in each file if the reads are in sync.

Looking at your command lines above you appear to have only dumped the data from SRA. I would check that step to make sure the data is not corrupt and that they are in sync. Tool to use would be repair.sh from BBMap suite to test.

You are aligning the two reads independently. Does your protocol require that? Paired-end reads are normally aligned together.

hicBuildMatrix -s Diploid_11_TACGCTGC_GAGCCTTA_R1_sorted.bam Diploid_11_TACGCTGC_GAGCCTTA_R1_sorted.bam 

You also appear to be providing the same R1 bam file twice to the command above.

ADD REPLY
0
Entering edit mode

I found that the problem is that the file size of R1 and R2 needs to be the same in order to use hicbuildmatrix.

No, that is definitely not true. Theoretically you could sequence R1 at 150bp and R2 at 100bp and R2 would be 33% smaller as R1, and the tool would still work if data are not corrupt.

ADD REPLY

Login before adding your answer.

Traffic: 2284 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6