Question

UCSC multiz100way

0

Entering edit mode

6.1 years ago

frankjieli • 0

I have some questions about the file refGene.exonNuc.fa in the alignment directory, which is FASTA alignments for the UCSC Known Gene CDS regions of the human genome (hg38/GRCh38, Feb. 2009) aligned to the assemblies.

The first question is Whether these FASTA alignments are stitched with the MAF blocks in the MAF files?

The second question is I am not sure the meaning of some fields in this file. For example, the following is a part of this file:

NM_000299_hg38_1_14 202 0 1 chr1:201283703-201283904+

ATGAACCACTCGCCGCTCAAGACCGCCTTGGCGTACGAATGCTTCCAGGACCAGGACAACTCCACGTTGGCTTTGCCGTCGGACCAAAAGATGAAAACAGGCACGTCTGGCAGGCAGCGCGTGCAGGAGCAGGTGATGATGACCGTCAAGCGGCAGAAGTCCAAGTCTTCCCAGTCGTCCACCCTGAGCCACTCCAATCGAG

from the title line, I can get this is the first exon sequence of NM_000299( 14 exon sequence alignments in total) and this sequence is 202bp, the location is at positive line chr1:201283703-201283904.

While, what's the meaning of the red field 0 1 ? And form other lines, these two fileds may be 1 0 , 0 2, or 2,0 and so on. What is the meaning of these fields.

Another question is about the relationship between refGene.exonNuc.fa and refGene.exonAA.fa. I'm not sure whether the peptide sequence is predicted by the nucleotide sequence in the refGene.exonNuc.fa.

Any suggestions will be appreciated.

alignment ucsc • 1.4k views

ADD COMMENT • link updated 6.1 years ago by GenoMax 141k • written 6.1 years ago by frankjieli • 0