Exception in thread "main" htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequence dictionaries are not the same size (84, 195)
1
0
Entering edit mode
2.7 years ago
bill • 0

Hi everyone,

I've just started studying bioinformatics and I am struggling a hard nut to crack. I have no idea to deal with it.

I downloaded their data from an article that included ATAC-seq datas. There are hundreds of BAM files for different sample ages. These files have removed mitochondria and duplicated sequences.

First I merged a sets of BAM files.

samtools merge scATAC_joint_noDup_noMT_MERGE.bam -b bamlist.txt
samtools index scATAC_joint_noDup_noMT_MERGE.bam

Then I collect metrics (Code and explanation from the author)

java -Xmx8g -jar picard.jar CollectMultipleMetrics I=scATAC_joint_noDup_noMT_MERGE.bam O=scATAC_joint_noDup_noMT_MERGE_metrics R=/directory_to/Genomes/panTro4/whole_genome.fa

I got

Error: Unable to access jarfile picard.jar

I found the solution on the Internet. Need to give picard.jar absolute path. And the reference genome needs to be changed to my path. So I changed it to the following code.

java -Xmx8g -jar /home/CaiCB/miniconda3/envs/atac/share/picard-2.25.7-0/picard.jar CollectMultipleMetrics I=scATAC_joint_noDup_noMT_MERGE.bam O=scATAC_joint_noDup_noMT_MERGE_metrics R=/Data/CaiCB/Genomes/human_hg38/hg38.fa 

Then I got new error

[Tue Aug 03 08:11:02 EDT 2021] Executing as CaiCB@localhost.localdomain on Linux 3. 10.0-1127.10.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b15; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.25.7

Exception in thread "main" htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequence dictionaries are not the same size (84, 195) at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil. java:259) at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(Sequen ceUtil.java:342) at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(Sequen ceUtil.java:328) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java: 117) at picard.analysis.CollectMultipleMetrics.doWork(CollectMultipleMetrics.jav a:739) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:3 08) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103 ) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Just pick some of them.

Can someone help me with my problem? be deeply grateful.

ATAC-seq PICARD • 2.7k views
ADD COMMENT
1
Entering edit mode
2.7 years ago

Can someone help me with my problem? be deeply grateful.

/directory_to/Genomes/panTro4/whole_genome.fa is not the reference that was used to map scATAC_joint_noDup_noMT_MERGE.bam

ADD COMMENT
0
Entering edit mode

Thank you for the answer

The script provided in the article suggests that I add my own reference genome

PICARD - v2.1.1 # user needs to add directory to reference genome (e.g. /directory_to/Genomes/panTro4/whole_genome.fa)

So I downloaded the human genome, hg38. (This article includes human and chimpanzee data, I first analyzed the human data.)

Could you tell me what kind of files I should use as a reference?

ADD REPLY
0
Entering edit mode

Could you tell me what kind of files I should use as a reference?

As I said, the reference file that was used to map the bam.

ADD REPLY
0
Entering edit mode

Thanks. I see what you mean.

I downloaded hg38.fa.gz from UCSC and there are many files on the website. I read the article carefully and found that the author ues hg19 as reference. So I downloaded hg19(chromFa.tar.gz) and uncompressed. Finally the code is running.

ADD REPLY

Login before adding your answer.

Traffic: 2861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6