I am following this guide (bedGraphToBigWig Tutorial and Report) to try and convert my bedGraph files to bigwig, but keep coming across errors like this:
GL456211.1 is not found in chromosome sizes file
After looking into UCSC's goldenpath, I found the names to convert these types of files such as:
chr1_GL456210_random 0 169725 GL456210.1 chr1_GL456211_random 0 241735 GL456211.1 chr1_GL456212_random 0 153618 GL456212.1 chr1_GL456213_random 0 39340 GL456213.1 chr1_GL456221_random 0 206961 GL456221.1
GL456211.1 will be converted to
However, since the bedGraph file is 1.8GB, I'm wondering if there is a way to scan through the entire file to convert many different names at the same time? I've used sed twice, but it seems like this file might have many iterations that need to be converted.
Thank you for the reply! This makes sense to me, but my main issue is that within my bedgraph file there are lines that contain other chromosome names such as
GL456211.1, and I'd need to either convert or get rid of those.
So even if I quickly converted the genome_size.txt file, there would still be mislabeled coordinates in the bedgraph file.
Why dont you convert your mislabeled chr name with your desired one.
Is there a way to streamline this? I would need to do this to around 20 unique chr names per file, and have around 100 files to do this for. That's the main issue.