Question: Protein coding mm10 refseq bed
gravatar for rbronste
5 months ago by
rbronste230 wrote:

Just trying to export a bed file from table browser for protein coding gene body locations in mm10 containing the following header/columns:

chr start end NA genename NMname strand

Not sure if there is a more straightforward way to get the following arrangement, thanks!

mm10 refseq bed • 302 views
ADD COMMENTlink modified 4 months ago by vkkodali990 • written 5 months ago by rbronste230
gravatar for arup
5 months ago by
arup870 wrote:

Use the Selected fields option in Output format and click on get output then choose required columns from selection page.

Link to table browser

Table Browser

Select columns:

Selection  Page

ADD COMMENTlink modified 5 months ago • written 5 months ago by arup870
gravatar for vkkodali
4 months ago by
United States
vkkodali990 wrote:

If you are interested in RefSeq data, why not download the GFF3 annotation from NCBI and parse that file? You can download the GFF3 file from RefSeq FTP site here:

A gene can be protein-coding and yet have one or more non-coding transcript variants. Hence, you need to first get the list of gene_ids that are coding at least one protein. You can do so by parsing the GFF3 file as follows:

zgrep -v '^#' interim_GRCm38.p6_top_level_2017-09-26.gff3.gz | awk 'BEGIN{FS="\t";OFS="\t"}($3=="CDS"){print $9}' | grep -o 'GeneID:[0-9]*' | sort -u > ~/GRCm38.p6_protein_coding_genes.txt

Then, you can grep for those geneids in the GFF3 file where the column 3 has gene to get the entire range of the gene and strand. It is unclear to me whether you are interested in just the range for gene or each transcript variant (because one of your columns is NM). Depending on exactly what you want, it is fairly easy to come up with an appropriate unix command to parse the GFF3 file and return a bed-style file.

ADD COMMENTlink written 4 months ago by vkkodali990
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 812 users visited in the last hour