Question: Is there any way to download knownCanonical set from the NCBI Refseq track as it is possible for the UCSC track ?
0
gravatar for lakhujanivijay
5 weeks ago by
lakhujanivijay4.7k
India
lakhujanivijay4.7k wrote:

From the UCSC genome browser, is there any way to download knownCanonical set from the NCBI Refseq track as it is possible for the UCSC track (see screenshot below)

ucsc

knownCanonical is not available in the dropdown in table option when we select NCBI Refseq

ncbi2

ADD COMMENTlink modified 27 days ago by jnavarr510 • written 5 weeks ago by lakhujanivijay4.7k

knownCanonical is a UCSC term so it won't be available for RefSeq. Specifically, what is it that you are looking for from NCBI RefSeq? Are you only interested in the 'Known RefSeqs' (aka RefSeqs with the NM/NR prefix)?

ADD REPLYlink written 5 weeks ago by vkkodali1.7k

I am trying to run DepthOfCoverage from GATK3 (it's an old - no more supported version) which requires RefSeq file , however, that files contains all transcripts and not just canonical transcript. I was wondering how can I generate that file.

ADD REPLYlink written 5 weeks ago by lakhujanivijay4.7k

In that case, RefSeq Select is your best option. Note, RefSeq Select is only available for protein-coding loci; so of the ~54k unique GeneIDs annotated currently, 19k are protein-coding and have a RefSeq Select. Are you interested in getting these data in GFF3 format? If so, you can either filter the latest RefSeq GFF3 or download GFF3 for just the RefSeq Select transcripts from the NCBI Nucleotide portal. Go to NCBI Nucleotide and search for the term RefSeq_Select[Filter]; then use the 'Send To' link at the top right corner to download 'File' in 'GFF3' format. The latter approach returns a GFF3 file that does not include all of the information normally included in the GFF3 files on FTP but that may be sufficient for your needs.

ADD REPLYlink written 5 weeks ago by vkkodali1.7k

You can probably use MANE instead. There is also RefSeq RNA fasta file for GRCh38 available from UCSC.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax78k

MANE is still in progress. There are still several genes that are not part of MANE. For example, only protein-coding genes are currently in the scope of MANE and that too, not all protein-coding genes are in MANE yet. And MANE picks one representative transcript for every gene. So, alternate splice variants that use a different promoter they are not included in the MANE set. If splice variants are important for your downstream analyses, MANE may not be the best choice. However, if you are interested in just one representative transcript for each gene, RefSeq Select may be a better choice for you. Only protein-coding genes are in scope for RefSeq Select as well but at least all genes have a RefSeq Select and MANE is a subset of RefSeq Select.

ADD REPLYlink written 5 weeks ago by vkkodali1.7k
1
gravatar for jnavarr5
27 days ago by
jnavarr510
jnavarr510 wrote:

Hello,

We are happy to let you know that the RefSeq Select dataset is now available on the development server, https://genome-preview.soe.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&c=chrX&g=refSeqComposite, for both hg38 and hg19.

Please note the message about how data and tools on our genome-test server are under development, have not been reviewed for quality, and are subject to change at any time. Unfortunately, it is not clear when we will do a quality check and release the RefSeq Select track to the public site. If you would like email updates about the UCSC Genome Browser, please subscribe to our Announcements List:

  • Subscribe: Email genome-announce+subscribe@soe.ucsc.edu
  • Unsubscribe: Email genome-announce+unsubscribe@soe.ucsc.edu
ADD COMMENTlink written 27 days ago by jnavarr510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1312 users visited in the last hour