anybody know what the 4 non-ACGUT characters are in Gencode 38 transcript file, if they do exist?
1
0
Entering edit mode
3.0 years ago
nathan bowen ▴ 20

got this from kallisto when indexing gencode.v38.transcripts.fa

warning: replaced 4 non-ACGUT characters in the input sequence with pseudorandom nucleotides

thoughts?

kallisto gencode.v38.transcripts.fa • 1.2k views
ADD COMMENT
2
Entering edit mode
3.0 years ago
dsull ★ 5.8k

There are bases labeled N in that file. So kallisto will replace them with random valid nucleotides (i.e. A, T, C, or G).

Given that the N bases only occurs 4 times in that ginormous file, it really won't affect anything.

ADD COMMENT
0
Entering edit mode

right, I was just curious as to how 4 Ns snuck in.....thanks.

ADD REPLY
1
Entering edit mode

A couple of points:

  1. Please do not add answers unless you're answering the top level question. Use Add Comment or Add Reply as appropriate. I've moved your post to the appropriate location this time.
  2. If the answer resolved your question, please mark it as accepted:

    upvote_bookmark_accept

ADD REPLY

Login before adding your answer.

Traffic: 2743 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6