Showing depth in table browser with bed format file
1
0
Entering edit mode
8.1 years ago
incub0x • 0

Hello:

I am using the tables of https://genome.ucsc.edu/cgi-bin/hgTables , with the next characteristics:

}group ~Genes and Gene Predictions

}region ~defined regions

<h5>In this part, I uploaded a bed format file, in the first column of the bed format file is the chromosome, in the second column the beginning of the sequence, in the third column is the ending of the sequence and in the last column is the number of times that the sequence is in my file, for that reason, the fourth column is very important to me.</h5>

}Output format ~selected fields from primary and related tables ~~~>get output.

~name ~chrom ~geneSymbol ~refseq ~description ~kgXref ~refSeqSummary ~~~>get output.

At the ending of the process the genome.ucsc.edu give me the chromosomes and the genes that correspond to the coordinates of the bed format file, but it doesn´t appear the fourth column next to the resulting genes. Some coordinates are not characterized in the genome browser, for that reason some coordinates are erased in the genome browser and I do not know what genes correspond to the specific number of the fourth column.

I would like to know if there is a way of getting the resulting genes with the numbers of the fourth column of my bed format file.

I would appreciate if someone can help me. ####

Thank you.

Alex

gene sequence next-gen genome • 1.8k views
ADD COMMENT
0
Entering edit mode

You may have realized that using "~" and "####" in your post has made its display a little strange. BioStar's editor is probably trying to interpret some of these characters as code. Consider using other formatting tools (quotes, bold, italics etc) to improve the display.

If I understand your question right then what you are looking for is a "union" of those two BED files so you have a file that looks like chr --> chr start --> chr end --> Gene name --> # of times present. Is that correct?

ADD REPLY
0
Entering edit mode

Hello genomemax2:

Thank you for answering me. I will try to reformulate my question:

I have a bed file with this characteristic:

*Chromosome *Chrom start *Chrom end *# of times that the sequence appear (sequencing depth)"

chr16 12745162 12745366 72

chr5 73280404 73280517 72

chr10 103823794 103823884 74

chr15 26882981 26882984 75

chr22 43955226 43955283 76

chr4 83113354 83113424 76

With tables of https:/ /genome.ucsc.edu/cgi-bin/hgTables I got the gene names of the coordinates, but the *# of times that the sequence appear (sequencing depth)" dissapear like this example:

hg38.knownGene.chrom hg38.kgXref.geneSymbol

chr16 CPPED1

chr5 RP11-60A8.1

chr10 SH3PXD2A

chr15 GABRB3

chr15 GABRA5

chr22 PNPLA3

Some coordinates are not characterized in the genome browser, for that reason some coordinates are erased in the genome browser and I do not know what genes correspond to the specific number of the fourth column

What I want is to get the gene names of the coordinates and next to the gene names the "# of times that the sequence appear (sequencing depth)" like the next example:

hg38.knownGene.chrom hg38.kgXref.geneSymbol *# of times that the sequence appear (sequencing depth)"

chr16 CPPED1 72

chr5 RP11-60A8.1 72

chr10 SH3PXD2A 74

chr15 GABRB3 75

chr15 GABRA5 ?

chr22 PNPLA3 76

If someone knows the way of doing this through https:/ /genome.ucsc.edu or any bioinformatics tools, I would appreciate it.

I hope to be clear.

Alex

ADD REPLY
1
Entering edit mode
8.0 years ago

Grab annotations:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg38 -N -e "SELECT k.chrom, kg.txStart, kg.txEnd, x.geneSymbol FROM knownCanonical k, knownGene kg, kgXref x WHERE k.transcript = x.kgID AND k.transcript = kg.name;" | sort-bed - > hg38.knownGene.bed

This looks like:

$ head hg38.knownGene.bed 
chr1    17368   17436   MIR6859-3
chr1    29553   31097   RP11-34P13.3
chr1    30365   30503   MIR1302-9
chr1    34553   36081   FAM138A
chr1    69090   70008   OR4F5
chr1    89294   120932  RP11-34P13.7
chr1    89550   91105   RP11-34P13.8
chr1    139789  140339  RP11-34P13.14
chr1    141473  149707  RP11-34P13.13
chr1    157783  157887  RNU6-1100P

Then given regions of interest:

$ more roi.unsorted.bed
chr16    12745162     12745366    72
chr5     73280404     73280517    72
chr10    103823794    103823884   74
chr15    26882981     26882984    75
chr22    43955226     43955283    76
chr4     83113354     83113424    76
...

Sort them:

$ sort-bed roi.unsorted.bed > roi.bed

Then map the regions to the gene annotations, and associate the region IDs (72, etc.) with each gene, based on single-base overlap between the region and the gene:

$ bedmap --echo --echo-map-id --delim '\t' --skip-unmapped hg38.knownGene.bed roi.bed > answer.bed

Munge the columns in the answer.bed file into any desired non-BED format with cut etc.

ADD COMMENT

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6