Entering edit mode
4.1 years ago
Kai_Qi
▴
130
Hi All:
I am doing CLIP-seq analysis following the manual of PureCLIP: I have 2 problems which I can not solve so far:
1 getting the reference genome:
(base) [caiqi@midway2-login2 genome_fa]$ gunzip ref.GRCh38.fa.gz
gzip: ref.GRCh38.fa.gz: unexpected end of file
(base) [caiqi@midway2-login2 genome_fa]$ zcat ref.GRCh38.fa.gz > ref.GRCh38.fa
gzip: ref.GRCh38.fa.gz: unexpected end of file
So I tried to get the reference genome from the iGenome website to do the STAR mapping and other preprocessing. Everything goes well untilI am about to use the package for analysis:
2: using the .fa file for analysis
(base) [caiqi@midway2-login2 PUM2_CLIP]$ pureclip -i aligned.f.duplRm.pooled.R2.bam -bai aligned.f.duplRm.pooled.R2.bam.bai -g ref.GRCh38.fa -iv 'chr1;chr2;chr3;' -nt 10 -o PureCLIP.crosslink_sites.bed
Protein-RNA crosslink site detection
===============
Created look-up table for values from -2000 to 0 with step size 0.00333333 (size: 600000).
Loading reference ...
ERROR: Can't load reference sequence from file 'ref.GRCh38.fa': Unexpected character 'M' found.
Can anyone help me out for these 2 bugs?
Thanks a lot,
Cai
Let me answer the question myself after several trying:
I downloaded the hg38 files instead of GRCh38 and make it through. Does not know if the results are satisfying but this way can make the script go through.
Thanks,