Question: R read.table() saying line does not have correct number of elements - when I check, it definitely does
1
gravatar for SpacemanSpiffo
3 months ago by
SpacemanSpiffo10 wrote:

I'm trying to read a csv table into R, but I get the following error "line 195 did not have 31 elements" however, when I check , both in Python and through copying the line into R as a string, the line (and the surrounding lines) definitely all have 31 elements.

Is anyone able to hazard a guess as to why R is flagging an incorrect number of lines?

Below is the line that is causing it to crash - supposed to have 31 elements, comma delimited

135411,single nucleotide variant,NM_001129727.2(PLEKHG4):c.1574A>G (p.Asp525Gly),25894,PLEKHG4,HGNC:24501,Likely benign,0,-,8044843,-,RCV000117986,MedGen:CN169374,not specified,germline,germline,GRCh37,NC_000016.9,16,67318242,67318242,A,G,16q22.1,no assertion criteria provided,1,,N,UniProtKB (protein):Q58EX7#VAR_050510,2,129965

The code I am using is:

t2 <- read.table("/Users/NAME/Desktop/variant_summary.csv", sep = ",")

Download link for first 200 lines of the csv file: https://drive.google.com/file/d/1_iUeGkyQwPvm3K2B5kfakhneuvpXpPYg/view?usp=sharing

R software error • 309 views
ADD COMMENTlink modified 3 months ago by ATpoint12k • written 3 months ago by SpacemanSpiffo10

Can you provide the table up to that line for download?

ADD REPLYlink written 3 months ago by ATpoint12k

Yes, thanks for replying. I've amended my original post with a link to the download for the first 200 lines.

ADD REPLYlink written 3 months ago by SpacemanSpiffo10
1
gravatar for h.mon
3 months ago by
h.mon22k
Brazil
h.mon22k wrote:

Use read.csv():

t3 <- read.csv( "first200.csv" )
ADD COMMENTlink written 3 months ago by h.mon22k

Wow, that worked perfectly first try, thanks very much. Any idea why read.table() wasn't working? Normally specifying sep = "," works fine

ADD REPLYlink written 3 months ago by SpacemanSpiffo10
2

It seems that the hash on column 29 was causing read.table to stop reading the line. Could that be the case?

ADD REPLYlink written 3 months ago by Martombo2.4k
3

Yes, this is exactly the case, the default for read.table() is comment.char = "#".

ADD REPLYlink written 3 months ago by h.mon22k
1
gravatar for ATpoint
3 months ago by
ATpoint12k
Germany
ATpoint12k wrote:

Another option using the data.table package, especially helpful when the file is large (hundreds of Mb or even Gb):

fread("your.file", sep=",", data.table=F)
ADD COMMENTlink written 3 months ago by ATpoint12k
0
gravatar for Chirag Parsania
3 months ago by
Chirag Parsania1.3k
University of Macau
Chirag Parsania1.3k wrote:

Tidy way

library(tidyverse)
dd  <- read_delim("~/Downloads/first200.csv" , delim = ",")

dd
> dd
# A tibble: 199 x 31
   `#AlleleID` Type  Name  GeneID GeneSymbol HGNC_ID ClinicalSignifi~ ClinSigSimple LastEvaluated `RS# (dbSNP)` `nsv/esv (dbVar~ RCVaccession PhenotypeIDS
         <int> <chr> <chr>  <int> <chr>      <chr>   <chr>                    <int> <chr>                 <int> <chr>            <chr>        <chr>       
 1       15041 indel NM_0~   9907 AP5Z1      HGNC:2~ Pathogenic                   1 29-Jun-10         397704705 -                RCV000000012 MedGen:C315~
 2       15041 indel NM_0~   9907 AP5Z1      HGNC:2~ Pathogenic                   1 29-Jun-10         397704705 -                RCV000000012 MedGen:C315~
 3       15042 dele~ NM_0~   9907 AP5Z1      HGNC:2~ Pathogenic                   1 29-Jun-10         397704709 -                RCV000000013 MedGen:C315~
 4       15042 dele~ NM_0~   9907 AP5Z1      HGNC:2~ Pathogenic                   1 29-Jun-10         397704709 -                RCV000000013 MedGen:C315~
 5       15043 sing~ NM_0~   9640 ZNF592     HGNC:2~ Uncertain signi~             0 29-Jun-15         150829393 -                RCV000000014 MedGen:CN03~
 6       15043 sing~ NM_0~   9640 ZNF592     HGNC:2~ Uncertain signi~             0 29-Jun-15         150829393 -                RCV000000014 MedGen:CN03~
 7       15044 sing~ NM_0~  55572 FOXRED1    HGNC:2~ Pathogenic                   1 7-Dec-17          267606829 -                RCV00000001~ MedGen:C183~
 8       15044 sing~ NM_0~  55572 FOXRED1    HGNC:2~ Pathogenic                   1 7-Dec-17          267606829 -                RCV00000001~ MedGen:C183~
 9       15045 sing~ NM_0~  55572 FOXRED1    HGNC:2~ Pathogenic                   1 1-Oct-10          267606830 -                RCV000000016 MedGen:C183~
10       15045 sing~ NM_0~  55572 FOXRED1    HGNC:2~ Pathogenic                   1 1-Oct-10          267606830 -                RCV000000016 MedGen:C183~
# ... with 189 more rows, and 18 more variables: PhenotypeList <chr>, Origin <chr>, OriginSimple <chr>, Assembly <chr>, ChromosomeAccession <chr>,
#   Chromosome <chr>, Start <int>, Stop <int>, ReferenceAllele <chr>, AlternateAllele <chr>, Cytogenetic <chr>, ReviewStatus <chr>, NumberSubmitters <int>,
#   Guidelines <chr>, TestedInGTR <chr>, OtherIDs <chr>, SubmitterCategories <int>, VariationID <int>
ADD COMMENTlink written 3 months ago by Chirag Parsania1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1268 users visited in the last hour