Question: How Do I Get The Gene Names From The Chromosome Locations In A Wig File
6
gravatar for pixie@bioinfo
9.3 years ago by
pixie@bioinfo1.4k
pixie@bioinfo1.4k wrote:

Hi.. I am interested to know the list of genes having histone modifications in their TSS/coding regions...For this I have a wig file which has a huge number of chromosome locations ...I had uploaded the file in UCSC genome browser to view custom tracks...but finding the names of the genes lying in those regions seems to be very difficult....Kindly let me know whether there is some other way for this... I have pasted some entries from the wig file...

track type=wiggle_0 name="GMAT_H3K9K14ac2" description="GMAT_H3K9K14ac2_hg18" visibility=full autoScale=off viewLimits=0.0:15.0 color=0,0,0
variableStep chrom=chr1 span=21
554973    1
554990    3
556480    1
559247    2
559431    2
559448    2
ucsc coordinates galaxy wiggle • 12k views
ADD COMMENTlink modified 11 months ago by RamRS23k • written 9.3 years ago by pixie@bioinfo1.4k
9
gravatar for Giovanni M Dall'Olio
9.3 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

you can do this by using Galaxy.

First, go to galaxy and upload your wig file. To do so, you have to click on the 'Get Data'>'Upload a file' button, select wig as the input file, and upload it. Remember to define the organism and the build.

Second, it is better if you convert the wig file to an 'Interval' format. To do so, click on the pencil in the 'History' (Edit attributes) frame in galaxy, to edit your wig file's attributes, and convert it as 'Wiggle to Interval'.

Third, you must get the table of refseq genes from UCSC. To do so, go to 'Get Data' in Galaxy, select UCSC, and when you are done, be sure to check that output must be sent to galaxy instead of a file.

Last, click on 'Operate on Genomic Intervals' in galaxy, and select the intersect tool. If you have more trouble, you can ask here.

ADD COMMENTlink written 9.3 years ago by Giovanni M Dall'Olio26k
4
gravatar for Jennifer Hillman Jackson
8.3 years ago by
Bay Area, CA
Jennifer Hillman Jackson390 wrote:

Giovanni has a good answer!

Here are two more options, if you are really interested in obtaining a gene name/symbol. When obtaining a reference gene track to compare against, try one of the following:

From the RefSeq Genes track, pull in the entire regGene file over to Galaxy, not just the default BED version (output format = all fields from selected table). Once in Galaxy, delete the first column (bin). Set file type as interval. Cut/merge the columns of the file to produce a BED4 format file (chrom, start, stop, name), using c12 as the "name" attribute and Join (inner, probably) to preserve the name in the output. C12 is actually name2 == Gene name/symbol in the UCSC table description (where c1 is name == just the RefSeq ID and what would be in a simple BED file output directly from the UCSC Table Browser).

Another data option is the UCSC Genes track, which also would have to be modified a bit to obtain a gene name/symbol. To do this, select the UCSC Genes track, output format = "selected fields from primary and related tables", submit, link in from kgXref the identifier(s) of choice, submit and send to Galaxy. Set file type as interval. Cut columns/merge to create a BED4 format file, using that alternate identifier as the "name" interval attribute during the Join ("name" from a simple BED output for this track would be UCSC's internal transcript identifier).

After the Join, cut columns out to produce a simple BED file, change file type to BED, create a custom track line (or not), and display at UCSC or directly in Galaxy using Trackster ("Visualization").

Hopefully this helps,

Jen

ADD COMMENTlink written 8.3 years ago by Jennifer Hillman Jackson390
0
gravatar for pixie@bioinfo
9.3 years ago by
pixie@bioinfo1.4k
pixie@bioinfo1.4k wrote:

Thanks for the info...I did as you said...after converting wiggle to interval, I clicked 'display data on browser'..data was coming fine as intervals...However when I did the third step (UCSC)...and tried to display it, only the first two intervals were coming..

1.Chrom2.Start3.End

chr1554972554993
chr1554989555010

Do I need to change any attribute at this step?...also..how do I see the names of the genes lying ...say.. in chromosome 1?

ADD COMMENTlink modified 11 months ago by RamRS23k • written 9.3 years ago by pixie@bioinfo1.4k

Please use 'Add comment' when answering to a question, and do not create a new answer. this site doesn't work as a normal online forum.

Did you open the output file with a text editor? In any case, it may be that your input file contains only two regions overlapping with genes. Make some tests by creating a wig file in which you already know how many regions overlap with a gene.

ADD REPLYlink written 9.3 years ago by Giovanni M Dall'Olio26k

Thanks for your help...I have rectified the error and I am getting the intervals...

ADD REPLYlink written 9.3 years ago by pixie@bioinfo1.4k
0
gravatar for Whoknows
3.2 years ago by
Whoknows750
Tehran,Iran
Whoknows750 wrote:

Hi friends

Thanks Giovanni for nice answer.

Converting procedure works fine.

I need a tool to get proportion from genes' read counts in the following way.

How can I get all genes average read count proportion ? for making sense I mean:

creating an average list from read count of all genes locations in a proportion way:

  • Location = Upsrtream 0-200 200-400 400-600 ... Downstream

  • Average read counts= 20 reads 15 reads 32 reads ....

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Whoknows750

Hi pcsam,

You should ask this question as a new thread, you'll get more responses.

ADD REPLYlink written 3.2 years ago by jotan1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1832 users visited in the last hour