UCSC multiz100way
0
0
Entering edit mode
6.1 years ago
frankjieli • 0

I have some questions about the file refGene.exonNuc.fa in the alignment directory, which is FASTA alignments for the UCSC Known Gene CDS regions of the human genome (hg38/GRCh38, Feb. 2009) aligned to the assemblies.

The first question is Whether these FASTA alignments are stitched with the MAF blocks in the MAF files?

The second question is I am not sure the meaning of some fields in this file. For example, the following is a part of this file:

NM_000299_hg38_1_14 202 0 1 chr1:201283703-201283904+

ATGAACCACTCGCCGCTCAAGACCGCCTTGGCGTACGAATGCTTCCAGGACCAGGACAACTCCACGTTGGCTTTGCCGTCGGACCAAAAGATGAAAACAGGCACGTCTGGCAGGCAGCGCGTGCAGGAGCAGGTGATGATGACCGTCAAGCGGCAGAAGTCCAAGTCTTCCCAGTCGTCCACCCTGAGCCACTCCAATCGAG

from the title line, I can get this is the first exon sequence of NM_000299( 14 exon sequence alignments in total) and this sequence is 202bp, the location is at positive line chr1:201283703-201283904.

While, what's the meaning of the red field 0 1 ? And form other lines, these two fileds may be 1 0 , 0 2, or 2,0 and so on. What is the meaning of these fields.

Another question is about the relationship between refGene.exonNuc.fa and refGene.exonAA.fa. I'm not sure whether the peptide sequence is predicted by the nucleotide sequence in the refGene.exonNuc.fa.

Any suggestions will be appreciated.

alignment ucsc • 1.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 2569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6