How Do I Get The Gene Names From The Chromosome Locations In A Wig File
4
6
Entering edit mode
13.9 years ago
pixie@bioinfo ★ 1.5k

Hi.. I am interested to know the list of genes having histone modifications in their TSS/coding regions...For this I have a wig file which has a huge number of chromosome locations ...I had uploaded the file in UCSC genome browser to view custom tracks...but finding the names of the genes lying in those regions seems to be very difficult....Kindly let me know whether there is some other way for this... I have pasted some entries from the wig file...

track type=wiggle_0 name="GMAT_H3K9K14ac2" description="GMAT_H3K9K14ac2_hg18" visibility=full autoScale=off viewLimits=0.0:15.0 color=0,0,0
variableStep chrom=chr1 span=21
554973    1
554990    3
556480    1
559247    2
559431    2
559448    2
wiggle ucsc coordinates galaxy • 15k views
ADD COMMENT
9
Entering edit mode
13.9 years ago

you can do this by using Galaxy.

First, go to galaxy and upload your wig file. To do so, you have to click on the 'Get Data'>'Upload a file' button, select wig as the input file, and upload it. Remember to define the organism and the build.

Second, it is better if you convert the wig file to an 'Interval' format. To do so, click on the pencil in the 'History' (Edit attributes) frame in galaxy, to edit your wig file's attributes, and convert it as 'Wiggle to Interval'.

Third, you must get the table of refseq genes from UCSC. To do so, go to 'Get Data' in Galaxy, select UCSC, and when you are done, be sure to check that output must be sent to galaxy instead of a file.

Last, click on 'Operate on Genomic Intervals' in galaxy, and select the intersect tool. If you have more trouble, you can ask here.

ADD COMMENT
4
Entering edit mode
12.9 years ago

Giovanni has a good answer!

Here are two more options, if you are really interested in obtaining a gene name/symbol. When obtaining a reference gene track to compare against, try one of the following:

From the RefSeq Genes track, pull in the entire regGene file over to Galaxy, not just the default BED version (output format = all fields from selected table). Once in Galaxy, delete the first column (bin). Set file type as interval. Cut/merge the columns of the file to produce a BED4 format file (chrom, start, stop, name), using c12 as the "name" attribute and Join (inner, probably) to preserve the name in the output. C12 is actually name2 == Gene name/symbol in the UCSC table description (where c1 is name == just the RefSeq ID and what would be in a simple BED file output directly from the UCSC Table Browser).

Another data option is the UCSC Genes track, which also would have to be modified a bit to obtain a gene name/symbol. To do this, select the UCSC Genes track, output format = "selected fields from primary and related tables", submit, link in from kgXref the identifier(s) of choice, submit and send to Galaxy. Set file type as interval. Cut columns/merge to create a BED4 format file, using that alternate identifier as the "name" interval attribute during the Join ("name" from a simple BED output for this track would be UCSC's internal transcript identifier).

After the Join, cut columns out to produce a simple BED file, change file type to BED, create a custom track line (or not), and display at UCSC or directly in Galaxy using Trackster ("Visualization").

Hopefully this helps,

Jen

ADD COMMENT
0
Entering edit mode
13.9 years ago
pixie@bioinfo ★ 1.5k

Thanks for the info...I did as you said...after converting wiggle to interval, I clicked 'display data on browser'..data was coming fine as intervals...However when I did the third step (UCSC)...and tried to display it, only the first two intervals were coming..

1.Chrom2.Start3.End

chr1554972554993
chr1554989555010

Do I need to change any attribute at this step?...also..how do I see the names of the genes lying ...say.. in chromosome 1?

ADD COMMENT
0
Entering edit mode

Please use 'Add comment' when answering to a question, and do not create a new answer. this site doesn't work as a normal online forum.

Did you open the output file with a text editor? In any case, it may be that your input file contains only two regions overlapping with genes. Make some tests by creating a wig file in which you already know how many regions overlap with a gene.

ADD REPLY
0
Entering edit mode

Thanks for your help...I have rectified the error and I am getting the intervals...

ADD REPLY
0
Entering edit mode
7.9 years ago
Whoknows ▴ 960

Hi friends

Thanks Giovanni for nice answer.

Converting procedure works fine.

I need a tool to get proportion from genes' read counts in the following way.

How can I get all genes average read count proportion ? for making sense I mean:

creating an average list from read count of all genes locations in a proportion way:

  • Location = Upsrtream 0-200 200-400 400-600 ... Downstream

  • Average read counts= 20 reads 15 reads 32 reads ....

ADD COMMENT
0
Entering edit mode

Hi pcsam,

You should ask this question as a new thread, you'll get more responses.

ADD REPLY

Login before adding your answer.

Traffic: 2749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6