Question: Protein coding mm10 refseq bed
gravatar for rbronste
10 months ago by
rbronste260 wrote:

Just trying to export a bed file from table browser for protein coding gene body locations in mm10 containing the following header/columns:

chr start end NA genename NMname strand

Not sure if there is a more straightforward way to get the following arrangement, thanks!

mm10 refseq bed • 564 views
ADD COMMENTlink modified 10 months ago by vkkodali1.2k • written 10 months ago by rbronste260
gravatar for arup
10 months ago by
arup1.7k wrote:

Use the Selected fields option in Output format and click on get output then choose required columns from selection page.

Link to table browser

Table Browser

Select columns:

Selection  Page

ADD COMMENTlink modified 10 months ago • written 10 months ago by arup1.7k
gravatar for vkkodali
10 months ago by
United States
vkkodali1.2k wrote:

If you are interested in RefSeq data, why not download the GFF3 annotation from NCBI and parse that file? You can download the GFF3 file from RefSeq FTP site here:

A gene can be protein-coding and yet have one or more non-coding transcript variants. Hence, you need to first get the list of gene_ids that are coding at least one protein. You can do so by parsing the GFF3 file as follows:

zgrep -v '^#' interim_GRCm38.p6_top_level_2017-09-26.gff3.gz | awk 'BEGIN{FS="\t";OFS="\t"}($3=="CDS"){print $9}' | grep -o 'GeneID:[0-9]*' | sort -u > ~/GRCm38.p6_protein_coding_genes.txt

Then, you can grep for those geneids in the GFF3 file where the column 3 has gene to get the entire range of the gene and strand. It is unclear to me whether you are interested in just the range for gene or each transcript variant (because one of your columns is NM). Depending on exactly what you want, it is fairly easy to come up with an appropriate unix command to parse the GFF3 file and return a bed-style file.

ADD COMMENTlink written 10 months ago by vkkodali1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2206 users visited in the last hour