Question: R read.table() saying line does not have correct number of elements - when I check, it definitely does
1
gravatar for SpacemanSpiffo
29 days ago by
SpacemanSpiffo10 wrote:

I'm trying to read a csv table into R, but I get the following error "line 195 did not have 31 elements" however, when I check , both in Python and through copying the line into R as a string, the line (and the surrounding lines) definitely all have 31 elements.

Is anyone able to hazard a guess as to why R is flagging an incorrect number of lines?

Below is the line that is causing it to crash - supposed to have 31 elements, comma delimited

135411,single nucleotide variant,NM_001129727.2(PLEKHG4):c.1574A>G (p.Asp525Gly),25894,PLEKHG4,HGNC:24501,Likely benign,0,-,8044843,-,RCV000117986,MedGen:CN169374,not specified,germline,germline,GRCh37,NC_000016.9,16,67318242,67318242,A,G,16q22.1,no assertion criteria provided,1,,N,UniProtKB (protein):Q58EX7#VAR_050510,2,129965

The code I am using is:

t2 <- read.table("/Users/NAME/Desktop/variant_summary.csv", sep = ",")

Download link for first 200 lines of the csv file: https://drive.google.com/file/d/1_iUeGkyQwPvm3K2B5kfakhneuvpXpPYg/view?usp=sharing

R software error • 185 views
ADD COMMENTlink modified 29 days ago by ATpoint9.3k • written 29 days ago by SpacemanSpiffo10

Can you provide the table up to that line for download?

ADD REPLYlink written 29 days ago by ATpoint9.3k

Yes, thanks for replying. I've amended my original post with a link to the download for the first 200 lines.

ADD REPLYlink written 29 days ago by SpacemanSpiffo10
1
gravatar for h.mon
29 days ago by
h.mon21k
Brazil
h.mon21k wrote:

Use read.csv():

t3 <- read.csv( "first200.csv" )
ADD COMMENTlink written 29 days ago by h.mon21k

Wow, that worked perfectly first try, thanks very much. Any idea why read.table() wasn't working? Normally specifying sep = "," works fine

ADD REPLYlink written 29 days ago by SpacemanSpiffo10
2

It seems that the hash on column 29 was causing read.table to stop reading the line. Could that be the case?

ADD REPLYlink written 29 days ago by Martombo2.3k
3

Yes, this is exactly the case, the default for read.table() is comment.char = "#".

ADD REPLYlink written 29 days ago by h.mon21k
1
gravatar for ATpoint
29 days ago by
ATpoint9.3k
Germany
ATpoint9.3k wrote:

Another option using the data.table package, especially helpful when the file is large (hundreds of Mb or even Gb):

fread("your.file", sep=",", data.table=F)
ADD COMMENTlink written 29 days ago by ATpoint9.3k
0
gravatar for Chirag Parsania
29 days ago by
Chirag Parsania1.2k
University of Macau
Chirag Parsania1.2k wrote:

Tidy way

library(tidyverse)
dd  <- read_delim("~/Downloads/first200.csv" , delim = ",")

dd
> dd
# A tibble: 199 x 31
   `#AlleleID` Type  Name  GeneID GeneSymbol HGNC_ID ClinicalSignifi~ ClinSigSimple LastEvaluated `RS# (dbSNP)` `nsv/esv (dbVar~ RCVaccession PhenotypeIDS
         <int> <chr> <chr>  <int> <chr>      <chr>   <chr>                    <int> <chr>                 <int> <chr>            <chr>        <chr>       
 1       15041 indel NM_0~   9907 AP5Z1      HGNC:2~ Pathogenic                   1 29-Jun-10         397704705 -                RCV000000012 MedGen:C315~
 2       15041 indel NM_0~   9907 AP5Z1      HGNC:2~ Pathogenic                   1 29-Jun-10         397704705 -                RCV000000012 MedGen:C315~
 3       15042 dele~ NM_0~   9907 AP5Z1      HGNC:2~ Pathogenic                   1 29-Jun-10         397704709 -                RCV000000013 MedGen:C315~
 4       15042 dele~ NM_0~   9907 AP5Z1      HGNC:2~ Pathogenic                   1 29-Jun-10         397704709 -                RCV000000013 MedGen:C315~
 5       15043 sing~ NM_0~   9640 ZNF592     HGNC:2~ Uncertain signi~             0 29-Jun-15         150829393 -                RCV000000014 MedGen:CN03~
 6       15043 sing~ NM_0~   9640 ZNF592     HGNC:2~ Uncertain signi~             0 29-Jun-15         150829393 -                RCV000000014 MedGen:CN03~
 7       15044 sing~ NM_0~  55572 FOXRED1    HGNC:2~ Pathogenic                   1 7-Dec-17          267606829 -                RCV00000001~ MedGen:C183~
 8       15044 sing~ NM_0~  55572 FOXRED1    HGNC:2~ Pathogenic                   1 7-Dec-17          267606829 -                RCV00000001~ MedGen:C183~
 9       15045 sing~ NM_0~  55572 FOXRED1    HGNC:2~ Pathogenic                   1 1-Oct-10          267606830 -                RCV000000016 MedGen:C183~
10       15045 sing~ NM_0~  55572 FOXRED1    HGNC:2~ Pathogenic                   1 1-Oct-10          267606830 -                RCV000000016 MedGen:C183~
# ... with 189 more rows, and 18 more variables: PhenotypeList <chr>, Origin <chr>, OriginSimple <chr>, Assembly <chr>, ChromosomeAccession <chr>,
#   Chromosome <chr>, Start <int>, Stop <int>, ReferenceAllele <chr>, AlternateAllele <chr>, Cytogenetic <chr>, ReviewStatus <chr>, NumberSubmitters <int>,
#   Guidelines <chr>, TestedInGTR <chr>, OtherIDs <chr>, SubmitterCategories <int>, VariationID <int>
ADD COMMENTlink written 29 days ago by Chirag Parsania1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 772 users visited in the last hour