I am trying to do an analysis using GRCh37.fa as reference genome. After running command
pureclip -i aligned.f.duplRm.pooled.R2.bam -bai aligned.f.duplRm.pooled.R2.bam.bai -g GRCh37.fa -iv "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;X;Y;" -nt 10 -o PureCLIP.crosslink_sites.bed
I received an error:
ERROR: Can't load reference sequence from file 'GRCh37.fa': Unexpected character 'M' found.
I got advice from the developer as:
The problem is coming from an external library which is used and which expects the reference sequence to contain only the letters 'A', 'C', 'G', 'T' or 'N'. I know it is not ideal, but if you convert all non-ACGTs to Ns, the problem should be solved
Does anyone can teach me how to convert all non-ACGT to Ns so that I will be able to give it a try?