Column lengths differ error (ArchR)
2
0
Entering edit mode
9 months ago

I keep receiving the error when running reformatFragmentReads() for ArchR: Detected 1 column names bu the data has 5 columns. Added 4 extra default column names at the end.

How would you fix this? For reference, I am working with .tsv.gz files.

Here's the error I face:

#### EDIT on 15-Jun-2022

I have been having the same error pop up for the past 5 days and have not found any helpful solution to this. Currently I am simply passing a function reformatFragmentFiles("fragments.tsv.gz"), and I only receive the following message:

Column 1 is length 1283934 which differs from length of column 1 (0).


Can someone provide *helpful solutions to mitigate this issue? Is there a certain manipulation I need to pass on the file directly?

atac-seq cellranger r archr • 1.9k views
0
Entering edit mode

0
Entering edit mode

am working with .tsv.gz files

so the question is "does this software works with gz files... ?"

0
Entering edit mode

The software works with .gz files; I've also tried .tsv.gz, .tsv.gz.tbi, etc. It seems to be a formatting issue from the CellRanger derived fragment files unfortunately. And when I try to use the reformatFragmentReads(), I run into the same issue.

0
Entering edit mode

I alread did. I'm seeking solutions however (i.e. script for mitigating the issue)...

0
Entering edit mode

zgrep first few lines and check the separator. Separator must be in line with program supported separators.

0
Entering edit mode

That command is only used to search for patterns; that would not fix the issue.

0
Entering edit mode

it's zcat. sorry for the typo. (However you can zgrep . a gzipped file to print every thing. Don't do it though).

0
Entering edit mode

I wish zhead and ztail were builtins similar to zcat. Aliases can be created, of course, but I'm disappointed at the lack of builtins.

0
Entering edit mode

1
Entering edit mode

Please copy and paste error messages when possible rather than using screenshots.

0
Entering edit mode

This is also not an acceptable way to create new posts with the same question.

2
Entering edit mode
9 months ago

That function uses fread from the data.table library to load the fragments file into memory and expects 5 columns named V1 through V5. The column names will not be in the actual fragments file, but will be added as default column names when the data is loaded. Try manually loading your fragments file using fread and checking whether you see the correct data format.

*EDIT* solution is here - Column lengths differ error (ArchR)

EDIT2 (GenoMax) - Code to remove the first line is provided by @Ram click --> Column lengths differ error (ArchR)

0
Entering edit mode

In addition to this, like cpad0112 mentioned, try using zcat fragments.tsv.gz | head to look at the first ten lines of the content. If nothing looks suspicious, try zcat -A instead of plain zcat in the command above to see all invisible characters.

0
Entering edit mode

zcat is to view the file's contents which I have already done and the contents look alright. I am asking how to bypass the error I keep viewing.

0
Entering edit mode

So now when I load it as reformatFragmentFiles(fragmentFiles = fread("/home/Downloads/fragments.tsv.gz")), it's an error. I think I see where you're coming from but not sure how I would be able to integrate fread inside readFragmentFiles. I have also tried renaming the file to a variable (i.e. a <- fread("/home/Downloads/fragments.tsv.gz") and then reformatFragmentFiles(a)), but this function needs a file path so I cannot use character vectors in that sense.

fread is called internally by the reformatFragmentReads function, meaning that you usually don't need to worry about it since the function will take care of running it. However, your error seems to be related to loading the file into memory, so the reason I want you to try and load the data in manually is to check whether this is being caused by the internal call to fread. If you post the results of the code below we can check whether it worked or not.

library("data.table")


0
Entering edit mode

Here's the output:

1
Entering edit mode

The first row in your fragment file is just # primary_contig=JH584295 which is causing the problem. If you remove that row it should work.

0
Entering edit mode

Unfortunately not working for me; I did: \$tail -n +2 fragments_104.tsv.gz > fragment_104_processed.tsv.gz and then passed it onto the same function, reformatFragmentReads(), just to receive the same error?

1
Entering edit mode

You cannot directly tail a gzipped file. Use

zcat fragments_104.tsv.gz | head -n 3


to check that the first couple of lines are weird in the expected manner, then use

zcat fragments_104.tsv.gz | tail -n +2 | gzip -c > fragments_104.tsv.first_line_removed.gz


reformatFragmentFiles(fragments_104.tsv.first_line_removed.gz)

reformatFragmentReads(fragments_104.tsv.first_line_removed.gz)


(I used first_line_removed in name instead of processed as it serves as a record of the processing.

Edited to change a wrong function name (reformatFragmentFiles should actually be reformatFragmentReads)

0
Entering edit mode

0
Entering edit mode

What is the output to:

zcat fragments_104.tsv.gz | head -n 3
zcat fragments_104.tsv.first_line_removed.gz | head -n 3

0
Entering edit mode
# id = SAMPLE104
# description =
#

0
Entering edit mode

There was an error in code before yesterday.

Instead of reformatFragmentFiles the code had reformatFragmentReads. If you had copy pasted the code then it would not have worked.

Can you confirm that you did try corrected code?

0
Entering edit mode

Oops, my bad. Good catch, GenoMax! It should not give "the same error" though (barring an insane coincidence)

0
Entering edit mode

Must have been a mistake from my part, but the correct function is reformatFragmentFiles()

0
Entering edit mode

You've been asked time and again to copy-paste plain text content instead of using screenshots. Are you having a problem using the site?

1
Entering edit mode
9 months ago
Ram 38k

It looks like you have multiple comment lines up top. You're going to have to do something like:

zgrep -v "^#" fragments_104.tsv.gz | gzip -c > fragments_104.comment_lines_removed.tsv.gz


and then reformatFragmentFiles(fragments_104.comment_lines_removed.tsv.gz)

0
Entering edit mode

Tried this this morning and it solved everything! I used this command in terminal and then used the processed file for the reformatFragmentFiles() function in ArchR. Successfully ran the function in R.

0
Entering edit mode