How Do I Get The Gene Names From The Chromosome Locations In A Wig File
4
6
Entering edit mode
11.4 years ago
pixie@bioinfo ★ 1.4k

Hi.. I am interested to know the list of genes having histone modifications in their TSS/coding regions...For this I have a wig file which has a huge number of chromosome locations ...I had uploaded the file in UCSC genome browser to view custom tracks...but finding the names of the genes lying in those regions seems to be very difficult....Kindly let me know whether there is some other way for this... I have pasted some entries from the wig file...

track type=wiggle_0 name="GMAT_H3K9K14ac2" description="GMAT_H3K9K14ac2_hg18" visibility=full autoScale=off viewLimits=0.0:15.0 color=0,0,0
variableStep chrom=chr1 span=21
554973    1
554990    3
556480    1
559247    2
559431    2
559448    2

wiggle ucsc coordinates galaxy • 13k views
9
Entering edit mode
11.4 years ago

you can do this by using Galaxy.

First, go to galaxy and upload your wig file. To do so, you have to click on the 'Get Data'>'Upload a file' button, select wig as the input file, and upload it. Remember to define the organism and the build.

Second, it is better if you convert the wig file to an 'Interval' format. To do so, click on the pencil in the 'History' (Edit attributes) frame in galaxy, to edit your wig file's attributes, and convert it as 'Wiggle to Interval'.

Third, you must get the table of refseq genes from UCSC. To do so, go to 'Get Data' in Galaxy, select UCSC, and when you are done, be sure to check that output must be sent to galaxy instead of a file.

Last, click on 'Operate on Genomic Intervals' in galaxy, and select the intersect tool. If you have more trouble, you can ask here.

4
Entering edit mode
10.4 years ago

Here are two more options, if you are really interested in obtaining a gene name/symbol. When obtaining a reference gene track to compare against, try one of the following:

From the RefSeq Genes track, pull in the entire regGene file over to Galaxy, not just the default BED version (output format = all fields from selected table). Once in Galaxy, delete the first column (bin). Set file type as interval. Cut/merge the columns of the file to produce a BED4 format file (chrom, start, stop, name), using c12 as the "name" attribute and Join (inner, probably) to preserve the name in the output. C12 is actually name2 == Gene name/symbol in the UCSC table description (where c1 is name == just the RefSeq ID and what would be in a simple BED file output directly from the UCSC Table Browser).

Another data option is the UCSC Genes track, which also would have to be modified a bit to obtain a gene name/symbol. To do this, select the UCSC Genes track, output format = "selected fields from primary and related tables", submit, link in from kgXref the identifier(s) of choice, submit and send to Galaxy. Set file type as interval. Cut columns/merge to create a BED4 format file, using that alternate identifier as the "name" interval attribute during the Join ("name" from a simple BED output for this track would be UCSC's internal transcript identifier).

After the Join, cut columns out to produce a simple BED file, change file type to BED, create a custom track line (or not), and display at UCSC or directly in Galaxy using Trackster ("Visualization").

Hopefully this helps,

Jen

0
Entering edit mode
11.4 years ago
pixie@bioinfo ★ 1.4k

Thanks for the info...I did as you said...after converting wiggle to interval, I clicked 'display data on browser'..data was coming fine as intervals...However when I did the third step (UCSC)...and tried to display it, only the first two intervals were coming..

1.Chrom2.Start3.End

chr1554972554993
chr1554989555010


Do I need to change any attribute at this step?...also..how do I see the names of the genes lying ...say.. in chromosome 1?

0
Entering edit mode

Did you open the output file with a text editor? In any case, it may be that your input file contains only two regions overlapping with genes. Make some tests by creating a wig file in which you already know how many regions overlap with a gene.

0
Entering edit mode

Thanks for your help...I have rectified the error and I am getting the intervals...

0
Entering edit mode
5.3 years ago
Whoknows ▴ 880

Hi friends

Converting procedure works fine.

I need a tool to get proportion from genes' read counts in the following way.

How can I get all genes average read count proportion ? for making sense I mean:

creating an average list from read count of all genes locations in a proportion way:

• Location = Upsrtream 0-200 200-400 400-600 ... Downstream

0
Entering edit mode

Hi pcsam,

You should ask this question as a new thread, you'll get more responses.