PICARD - v2.1.1 # user needs to add directory to reference genome (e.g. /directory_to/Genomes/panTro4/whole_genome.fa)

Question

Exception in thread "main" htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequence dictionaries are not the same size (84, 195)

0

Entering edit mode

2.7 years ago

bill • 0

Hi everyone,

I've just started studying bioinformatics and I am struggling a hard nut to crack. I have no idea to deal with it.

I downloaded their data from an article that included ATAC-seq datas. There are hundreds of BAM files for different sample ages. These files have removed mitochondria and duplicated sequences.

First I merged a sets of BAM files.

samtools merge scATAC_joint_noDup_noMT_MERGE.bam -b bamlist.txt
samtools index scATAC_joint_noDup_noMT_MERGE.bam

Then I collect metrics (Code and explanation from the author)

java -Xmx8g -jar picard.jar CollectMultipleMetrics I=scATAC_joint_noDup_noMT_MERGE.bam O=scATAC_joint_noDup_noMT_MERGE_metrics R=/directory_to/Genomes/panTro4/whole_genome.fa

I got

Error: Unable to access jarfile picard.jar

I found the solution on the Internet. Need to give picard.jar absolute path. And the reference genome needs to be changed to my path. So I changed it to the following code.

java -Xmx8g -jar /home/CaiCB/miniconda3/envs/atac/share/picard-2.25.7-0/picard.jar CollectMultipleMetrics I=scATAC_joint_noDup_noMT_MERGE.bam O=scATAC_joint_noDup_noMT_MERGE_metrics R=/Data/CaiCB/Genomes/human_hg38/hg38.fa

Then I got new error

[Tue Aug 03 08:11:02 EDT 2021] Executing as CaiCB@localhost.localdomain on Linux 3. 10.0-1127.10.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b15; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.25.7

Exception in thread "main" htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequence dictionaries are not the same size (84, 195) at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil. java:259) at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(Sequen ceUtil.java:342) at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(Sequen ceUtil.java:328) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java: 117) at picard.analysis.CollectMultipleMetrics.doWork(CollectMultipleMetrics.jav a:739) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:3 08) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103 ) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Just pick some of them.

Can someone help me with my problem? be deeply grateful.

ATAC-seq PICARD • 2.7k views

ADD COMMENT • link 2.7 years ago by bill • 0

score 1 · Answer 1 · 2021-08-03

1

Entering edit mode

2.7 years ago

Pierre Lindenbaum 161k

Can someone help me with my problem? be deeply grateful.

/directory_to/Genomes/panTro4/whole_genome.fa is not the reference that was used to map scATAC_joint_noDup_noMT_MERGE.bam

ADD COMMENT • link 2.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thank you for the answer

The script provided in the article suggests that I add my own reference genome

PICARD - v2.1.1 # user needs to add directory to reference genome (e.g. /directory_to/Genomes/panTro4/whole_genome.fa)

So I downloaded the human genome, hg38. (This article includes human and chimpanzee data, I first analyzed the human data.)

Could you tell me what kind of files I should use as a reference?

ADD REPLY • link 2.7 years ago by bill • 0

0

Entering edit mode

Could you tell me what kind of files I should use as a reference?

As I said, the reference file that was used to map the bam.

ADD REPLY • link 2.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks. I see what you mean.

I downloaded hg38.fa.gz from UCSC and there are many files on the website. I read the article carefully and found that the author ues hg19 as reference. So I downloaded hg19(chromFa.tar.gz) and uncompressed. Finally the code is running.

ADD REPLY • link 2.7 years ago by bill • 0