Question: wig files to bigwig using UCSC kent
1
gravatar for varsha619
7 months ago by
varsha61970
varsha61970 wrote:

Hello! I am trying to convert wig files to bigwig using UCSC kent module -

grep -hv 'track' in.wig > 1.wig

sed '1d' 1.wig > 2.wig

wigToBigWig 2.wig -clip chrom.sizes 2.bw

I get the error - hashMustFindVal: 'chr2CEN' not found

I don't think this is a genome version error, I tried the most current and a previous version and still get the error. I already looked for answers in -

https://biostar.usegalaxy.org/p/11115/

bedGrapthToBigWig conversion, xxx is not found in chromosome files

Has anyone else faced this? Please let me know, thank you for your help!

wigtobigwig • 537 views
ADD COMMENTlink modified 7 months ago by genecats.ucsc510 • written 7 months ago by varsha61970
3

Not sure but could this is contig (like chrUN) in the wiggle file that has no match in the reference. If so, i would just delete all occurrences of it with awk/sed.

ADD REPLYlink written 7 months ago by YaGalbi1.3k

It seems like not just chr2CEN, when I delete occurrences of this I end up getting error with a different chr. If it is not the genome version, could it be a difference between UCSC and EMBL annotations?

ADD REPLYlink written 7 months ago by varsha61970

It could be but chr prefix seems to indicate that this is likely UCSC version. Is that what you used for the original alignments? You can't mix and match these files.

ADD REPLYlink written 7 months ago by genomax54k

The analysis was done by someone else and I am just trying to use their published wig files for some analysis. I can verify with the authors the genome build they used, thanks!

ADD REPLYlink written 7 months ago by varsha61970
1

Yeah....I usually just delete all of those from the file....so basically run an if loop ..

if (the line does NOT contain chrINT|chrX|chrY)
{
print the line to contig.file
delete the line from the wiggle file
}

you are not going to be able use those contigs anyway

just check the printed contig.file to make sure u are not deleting anything important

There will only be a few lines (<30 I expect)

ADD REPLYlink modified 7 months ago • written 7 months ago by YaGalbi1.3k

Hello @kennethcondon2007, thank you I will try that. Would you happen to know the reason for this issue with wigs?

ADD REPLYlink written 7 months ago by varsha61970

There is no issue at all. This isn't a mistake in the files. As the years go by the reference genomes are updated. This updating involves that changing of the coordinates of many genes such as the start or end positions. As I understand it, one of the products of doing this is that some new or old parts of the genome do not have enough evidence to be included as part of the main chromosomes INT/X/Y/M so rather than they being deleted from the reference, they are added as an addictional "contig" with their own scaffold name i.e. chrUN and other variations.

As I said, this is as I understand it - I;m sure genomax would have a better explanation.

ADD REPLYlink modified 7 months ago • written 7 months ago by YaGalbi1.3k

Thank you for the explanation, that makes a lot of sense. I am still trying to figure out the genome build of the files so I can convert them to the new build, instead of deleting the corresponding lines from the files.

ADD REPLYlink written 6 months ago by varsha61970
0
gravatar for genecats.ucsc
7 months ago by
genecats.ucsc510
genecats.ucsc510 wrote:

You need to make sure that all of the chromosome names in your wiggle file are accounted for in the chrom.sizes file.

For instance, if my wiggle file looks like this:

variableStep chrom=chr2CEN
3003560 0

And my chrom.sizes file looks like this:

chr2CEN 242193529

Then I can still run wigToBigWig just fine:

wigToBigWig test1.wig test.chrom.sizes out1.bw

Now whether that wiggle will actually display in the genome browser is a different story, but it seems to me that your wiggle just has incorrect chromosome names and needs to be fixed.

If you have further questions about UCSC data or tools feel free to send your question to one of the below mailing lists:

  • General questions: genome@soe.ucsc.edu
  • Questions involving private data: genome-www@soe.ucsc.edu
  • Questions involving mirror sites: genome-mirror@ose.ucsc.edu

ChrisL from the UCSC Genome Browser

ADD COMMENTlink written 7 months ago by genecats.ucsc510
1

Hi Chris, I am not sure if that is the issue in my case. This is how my wig file format looks -

0

track type=wiggle_0

variableStep chrom=chr2L

I removed the 1st line and track line before running wigToBigWig and I used fetchChromSizes to get the chrom.sizes file. Am I missing something here? Thanks for your help!

ADD REPLYlink written 7 months ago by varsha61970
1

Yes what are the other chrom lines like in the wiggle file though? Do all the chromosome names correspond to what is the chrom.sizes file? Try grepping for 'chr2' from your wiggle file and see what shows up. Or something like this to get only the chromosome names:

$ grep chrom userWig2.wig  | cut -d'=' -f2

You can also try something like this find chromosomes in the wiggle that aren't in the chrom.sizes file, because you mentioned it fails on different chromosome names if you remove a particular one:

$ grep -v -Fwf <(cut -f1 dm6.chrom.sizes) <(grep chrom userWig2.wig  | cut -d'=' -f2 | sort -u )

If that doesn't output anything then it would help if you could share a link to the wiggle file you are trying to convert. If the file is private the genome-www address I mentioned in my previous response will only be seen UCSC Genome Browser staff.

ADD REPLYlink written 7 months ago by genecats.ucsc510

Hello @genecats.ucsc, I was able to grep out the chrom values that did not match the chrom.sizes file using -

grep -vE '(track|chr2CEN|chr3CEN|...|chrU)'

Now when I run wigToBigWig, I get the error - Overlap on chr3. Please remove overlaps and try again.

This makes me worry a little since I am not sure if removing the redundant chr location lines is a good idea. Please advise.

ADD REPLYlink written 7 months ago by varsha61970

This error happens when you have wiggle lines like the following:

variableStep chrom=chromName span=5
1000 0.56
1001 0.55

In this case the positions 1000-1004 are supposed to have the value 0.56 but then on the next line positions 1001-1005 are supposed to have value 0.55, and since a single position (in this example coordinates 10001-1004) can't have more than one value, you get an error.

You will have to decide for yourself whether or not it is a good idea to remove these redundant lines or not. Getting into contact with whoever made the file and figuring out how the file was made is probably the best option, especially so you can figure out how the strange chromosome names got into the file as well.

ADD REPLYlink written 7 months ago by genecats.ucsc510

I will get on that, thank you again for your help!

ADD REPLYlink written 7 months ago by varsha61970
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1741 users visited in the last hour