Non A T G C base in RefSeq zebrafish mRNA sequence
2
0
Entering edit mode
8.4 years ago
xiangwulu ▴ 120

I found some non A T C G base in refseq zebrafish mRNA sequence: 'Y', 'K', 'R', 'M', 'W'

Couldn't find out what do they mean.

e.g.

NM_001012366.1
http://www.ncbi.nlm.nih.gov/nuccore/59933235?report=fasta
at line:
CCCCCACAGTCCCTGCATTACGGGAATGTGCAGGCAAGAGGAAGCGGTCTCAGGGAGAGGAGGMCGAAGG

This is causing error for some alignment tool, such as bfast.

Thanks

RNA-Seq sequence • 1.8k views
ADD COMMENT
3
Entering edit mode
8.4 years ago
Lemire ▴ 940

Could be IUPAC codes for ambiguous nucleotides: http://droog.gs.washington.edu/parc/images/iupac.html

ADD COMMENT
2
Entering edit mode
8.4 years ago
Juke34 8.5k
Yes it is IUPAC. If they crash the tool you use, just mask them (replace them by N)
ADD COMMENT

Login before adding your answer.

Traffic: 2956 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6