PureClip to analyze CLIP-SEQ
0
0
Entering edit mode
4.1 years ago
Kai_Qi ▴ 130

Hi All:

I am doing CLIP-seq analysis following the manual of PureCLIP: I have 2 problems which I can not solve so far:

1 getting the reference genome:

(base) [caiqi@midway2-login2 genome_fa]$ gunzip ref.GRCh38.fa.gz  

gzip: ref.GRCh38.fa.gz: unexpected end of file
(base) [caiqi@midway2-login2 genome_fa]$ zcat ref.GRCh38.fa.gz > ref.GRCh38.fa

gzip: ref.GRCh38.fa.gz: unexpected end of file

So I tried to get the reference genome from the iGenome website to do the STAR mapping and other preprocessing. Everything goes well untilI am about to use the package for analysis:

2: using the .fa file for analysis

(base) [caiqi@midway2-login2 PUM2_CLIP]$ pureclip -i aligned.f.duplRm.pooled.R2.bam -bai aligned.f.duplRm.pooled.R2.bam.bai -g ref.GRCh38.fa -iv 'chr1;chr2;chr3;' -nt 10 -o PureCLIP.crosslink_sites.bed

Protein-RNA crosslink site detection 
===============

Created look-up table for values from -2000 to 0 with step size 0.00333333 (size: 600000).
Loading reference ... 
ERROR: Can't load reference sequence from file 'ref.GRCh38.fa': Unexpected character 'M' found.

Can anyone help me out for these 2 bugs?

Thanks a lot,

Cai

sequencing RNA-Seq • 815 views
ADD COMMENT
0
Entering edit mode

Let me answer the question myself after several trying:

I downloaded the hg38 files instead of GRCh38 and make it through. Does not know if the results are satisfying but this way can make the script go through.

Thanks,

ADD REPLY

Login before adding your answer.

Traffic: 2278 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6