1 getting the reference genome:

Question

PureClip to analyze CLIP-SEQ

0

Entering edit mode

4.1 years ago

Kai_Qi ▴ 130

Hi All:

I am doing CLIP-seq analysis following the manual of PureCLIP: I have 2 problems which I can not solve so far:

1 getting the reference genome:

(base) [caiqi@midway2-login2 genome_fa]$ gunzip ref.GRCh38.fa.gz  

gzip: ref.GRCh38.fa.gz: unexpected end of file
(base) [caiqi@midway2-login2 genome_fa]$ zcat ref.GRCh38.fa.gz > ref.GRCh38.fa

gzip: ref.GRCh38.fa.gz: unexpected end of file

So I tried to get the reference genome from the iGenome website to do the STAR mapping and other preprocessing. Everything goes well untilI am about to use the package for analysis:

2: using the .fa file for analysis

(base) [caiqi@midway2-login2 PUM2_CLIP]$ pureclip -i aligned.f.duplRm.pooled.R2.bam -bai aligned.f.duplRm.pooled.R2.bam.bai -g ref.GRCh38.fa -iv 'chr1;chr2;chr3;' -nt 10 -o PureCLIP.crosslink_sites.bed

Protein-RNA crosslink site detection 
===============

Created look-up table for values from -2000 to 0 with step size 0.00333333 (size: 600000).
Loading reference ... 
ERROR: Can't load reference sequence from file 'ref.GRCh38.fa': Unexpected character 'M' found.

Can anyone help me out for these 2 bugs?

Thanks a lot,

Cai

sequencing RNA-Seq • 815 views

ADD COMMENT • link updated 4.1 years ago by GenoMax 141k • written 4.1 years ago by Kai_Qi ▴ 130

0

Entering edit mode

Let me answer the question myself after several trying:

I downloaded the hg38 files instead of GRCh38 and make it through. Does not know if the results are satisfying but this way can make the script go through.

Thanks,

ADD REPLY • link 4.1 years ago by Kai_Qi ▴ 130