Hi everyone,
I've just started studying bioinformatics and I am struggling a hard nut to crack. I have no idea to deal with it.
I downloaded their data from an article that included ATAC-seq datas. There are hundreds of BAM files for different sample ages. These files have removed mitochondria and duplicated sequences.
First I merged a sets of BAM files.
samtools merge scATAC_joint_noDup_noMT_MERGE.bam -b bamlist.txt
samtools index scATAC_joint_noDup_noMT_MERGE.bam
Then I collect metrics (Code and explanation from the author)
java -Xmx8g -jar picard.jar CollectMultipleMetrics I=scATAC_joint_noDup_noMT_MERGE.bam O=scATAC_joint_noDup_noMT_MERGE_metrics R=/directory_to/Genomes/panTro4/whole_genome.fa
I got
Error: Unable to access jarfile picard.jar
I found the solution on the Internet. Need to give picard.jar absolute path. And the reference genome needs to be changed to my path. So I changed it to the following code.
java -Xmx8g -jar /home/CaiCB/miniconda3/envs/atac/share/picard-2.25.7-0/picard.jar CollectMultipleMetrics I=scATAC_joint_noDup_noMT_MERGE.bam O=scATAC_joint_noDup_noMT_MERGE_metrics R=/Data/CaiCB/Genomes/human_hg38/hg38.fa
Then I got new error
[Tue Aug 03 08:11:02 EDT 2021] Executing as CaiCB@localhost.localdomain on Linux 3. 10.0-1127.10.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b15; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.25.7
Exception in thread "main" htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequence dictionaries are not the same size (84, 195) at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil. java:259) at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(Sequen ceUtil.java:342) at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(Sequen ceUtil.java:328) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java: 117) at picard.analysis.CollectMultipleMetrics.doWork(CollectMultipleMetrics.jav a:739) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:3 08) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103 ) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Just pick some of them.
Can someone help me with my problem? be deeply grateful.
Thank you for the answer
The script provided in the article suggests that I add my own reference genome
So I downloaded the human genome, hg38. (This article includes human and chimpanzee data, I first analyzed the human data.)
Could you tell me what kind of files I should use as a reference?
As I said, the reference file that was used to map the bam.
Thanks. I see what you mean.
I downloaded hg38.fa.gz from UCSC and there are many files on the website. I read the article carefully and found that the author ues hg19 as reference. So I downloaded hg19(chromFa.tar.gz) and uncompressed. Finally the code is running.