Question: "TRUE" instead of "T" allele in R
1
gravatar for manay
13 months ago by
manay10
manay10 wrote:

Hi,

I have a data set which includes A, T,C,G. When I use read.table command for this data set, I see TRUE's in terms of T alleles. It happens if the entire column just includes T. How can I fix that problem?

Thanks.

R • 554 views
ADD COMMENTlink modified 13 months ago by Charles Plessy2.5k • written 13 months ago by manay10
6
gravatar for Charles Plessy
13 months ago by
Charles Plessy2.5k
Japan
Charles Plessy2.5k wrote:

read.table() tries to guess the class of the input data and will sometimes be mislead.

> read.table(text=("A C G T"))
  V1 V2 V3   V4
1  A  C  G TRUE

> summary(read.table(text=("A C G T")))
 V1    V2    V3       V4         
 A:1   C:1   G:1   Mode:logical  
                   TRUE:1        
                   NA's:0

It is possible to specify in advance the class of the columns; see ?read.table for details.

> read.table(text=("A C G T"), colClasses = "character")
  V1 V2 V3 V4
1  A  C  G  T

Related to Chris's answer in this particular case, stringAsFactors will not solve the problem.

> read.table(text=("A C G T"), stringsAsFactors = FALSE)
  V1 V2 V3   V4
1  A  C  G TRUE

Note that there are other cases where T may be coerced to TRUE instead of "T". In particular, pay attention that there is one gene whose symbol is _T_ !

ADD COMMENTlink written 13 months ago by Charles Plessy2.5k

I would argue this is as bad as Excel converting gene names to dates.
I assume there is a better alternative to read.table() without these quirks?

ADD REPLYlink written 13 months ago by WouterDeCoster30k
2

Yes. Use readr and explicitly state the datatypes for the columns.

ADD REPLYlink written 13 months ago by russhh3.6k
1

Not that I know,

> system("printf 'A C T G\nA C T G\n' > test.txt")
> read.table("test.txt")
  V1 V2   V3 V4
1  A  C TRUE  G
2  A  C TRUE  G

> data.table::fread("test.txt", head = F)
   V1 V2   V3 V4
1:  A  C TRUE  G
2:  A  C TRUE  G

> as.data.frame(readr::read_delim("test.txt", " ", col_names=FALSE))
Parsed with column specification:
cols(
  X1 = col_character(),
  X2 = col_character(),
  X3 = col_logical(),
  X4 = col_character()
)
  X1 X2   X3 X4
1  A  C TRUE  G
2  A  C TRUE  G

However, this only happens when a column only contains values that look like logical.

> read.table(text="A C T G\nA C T G\n")
  V1 V2   V3 V4
1  A  C TRUE  G
2  A  C TRUE  G

> read.table(text="A C T G\nA C F G\n")
  V1 V2    V3 V4
1  A  C  TRUE  G
2  A  C FALSE  G

> read.table(text="A C T G\nA C G G\n")
  V1 V2 V3 V4
1  A  C  T  G
2  A  C  G  G
ADD REPLYlink written 13 months ago by Charles Plessy2.5k

Thank you very much all of you !

colClasses="character" solved that problem.

ADD REPLYlink written 13 months ago by manay10

You are welcome. Please click on the "Accept!" button so that my answer appears at the top of the list. This is important since the other answer does not solve the problem.

ADD REPLYlink written 13 months ago by Charles Plessy2.5k
2
gravatar for Chris Miller
13 months ago by
Chris Miller20k
Washington University in St. Louis, MO
Chris Miller20k wrote:

I suspect that using "read.table( . . . stringAsFactors=F)" will solve your problem.

(edit - it will not! see the comprehensive answer above)

ADD COMMENTlink modified 13 months ago • written 13 months ago by Chris Miller20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1387 users visited in the last hour