Question: viewing MAF files as dataframe of size over 2GB
0
gravatar for noon
6 weeks ago by
noon0
noon0 wrote:

I used GATK's Funcotator to annotate a VCF file I have and it produced a MAF file just over 2GB in size. I've tried using pandas in Python and maftools in R and it hasn't worked. Specifically the file size seems to be too large to be opened in R throwing this error

Error in data.table::fread(file = maf, sep = "\t", stringsAsFactors = FALSE,  : File '' does not exist or is non-readable.

and pandas isn't really made for MAF files. Usually when running this annotation it was enough to open it in Excel but this file is way too big. Does anybody know an application or package (whether it be R or python or something else) to open MAF files of this size? Any help is appreciated.

gatk annotations python R maf • 136 views
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by noon0

That error is not because memory, it is pointing that the file is not located where your maf variable declares it.

Why do you need to open it? Linux commands more, grep, awk can help you to view the content.

ADD REPLYlink written 6 weeks ago by JC11k
1

I wrote a simple helper alias for files like these:

alias tsview='column -s$'\''\t'\'' -t | less -S'

Spreadsheet-like view on the terminal!

ADD REPLYlink written 6 weeks ago by RamRS30k

Yes this works so I can view it! The problem is I need to manipulate it as if it were a dataframe so I can apply certain thresholds to the data and select for specific columns.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by noon0

Your maf variable doesn't seem to contain the path to the MAF file. Can you show the output of dput(maf)?

ADD REPLYlink written 6 weeks ago by RamRS30k

This is what I have:

 funcotation = system.file('extdata', '/Users/Downloads/funcotated.maf',
                              package='maftools')
readfunc=read.maf(maf=funcotation)

when I use dput(maf) I get this:

""
ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by noon0
2
gravatar for RamRS
6 weeks ago by
RamRS30k
Baylor College of Medicine, Houston, TX
RamRS30k wrote:

That's not how system.file works. Use: read.maf(maf = '/Users/Downloads/funcotated.maf') and skip the first line altogether.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by RamRS30k

Thank you that was the problem! Sorry I'm new to using maftools so I was going off the documentation tutorial.

Thank you for your time!

ADD REPLYlink written 6 weeks ago by noon0
1

That happens. I find it useful to take the following steps while working on changing documentation code:

  1. Run the code and ensure it works as given (it usually does)
  2. Understand each function call and parameter in the lines leading up to the line I wish to change: in your case, that would mean reading through ?system.file and looking at its parameters. It is here that you'd find that it finds files that are included within packages. You're using a custom file, so system.file is not for you.
  3. OK, then, how do you give a custom file? From the ?system.file documentation, it is clear that it returns a string with the path to the file. So, if you have that string, you can do what system.file() does. As it turns out, you have the path. Replace the call to system.file with the path you have, and you're all set.

It also helps to check each line's execution and output as that line is executed. You'd have noticed the error happening in line 1 and probably solved the problem yourself.

ADD REPLYlink written 6 weeks ago by RamRS30k
1

BTW, please accept my answer using the green check mark on the left to mark the question solved.

Upvote|Bookmark|Accept

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by RamRS30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1197 users visited in the last hour