Affymetric HuGene-2_0-st NetAffx transcript annot - [WARNING: THIS FIELD TRUNCATED]
0
2
Entering edit mode
8.0 years ago
aheinzel ▴ 130

Hi everybody,

while parsing Affymetrix's NetAffx transcript annotation file (HuGene-2_0-st-v1.na35.hg19.transcript.csv) for HuGene-2-0-st chips I discovered that some of the gene_assignment and mrna_assignment fields in the file are incomplete and that these fields contain the string [WARNING: THIS FIELD TRUNCATED]. It appears like all affected fields (only checked gene_assignment and mrna_assignment columns) are 32532 characters long (including the warning message). I was wondering if anyone else already ran into the issue and maybe has an official or in-official explanation for these entries.

Some background (can be safely skipped - TLDR): The NetAffx transcript annotation file for HuGene-2_0-st file is (as other NetAffx annotation files) CSV formatted using commas as seperator and double quotes as quotation chars. Each line holds 18 columns, among others the before mentioned gene_assignment and mrna_assignment columns. Both these columns hold structured annotation (Affymetrix refers to it as multipart) and can hold multiple annotation entries. ⍽///⍽ are used to seperate multiple annotation entries and ⍽//⍽ are used as seperator within an annotation entry. In the documentation the following description is provided for the two columns: gene_assignment: >>Gene information for each assigned mRNA for mRNAs that corresponds to known genes.<< mrna_assignment: >>Description of the public mRNAs that should be detected by the sets within this transcript cluster based on sequence alignment.<< In consequence for each annotation entry in the gene_assignment column a corresponding entry in the mrna_assignment column should be available, but due to the fact that also some mrna_assignment column values have been truncated this does not hold true. Maybe, I appear now to be overly pedantic, however, it is not the fact that this basic assumption is violated that bugs me, but the remote chance that a gene assignment could be missing and that for certain transcripts listed in the gene_assignment column detailed information like assignment score and coverage is not available.

NetAffx Annotation HuGene-2_0-st • 3.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6