I am using salmon on two very large data sets. over 20,000 samples. Some are paired some are single. some are in gz format, others are in tar.gz.
All the files in 'tar' not 'tar.gz' format fail. Is the problem tar? ie. do I need to extract the tar file first?
I am having trouble with 2 samples. They are in 'tar.gz' format. I get a 255 exit code any idea what this means?
Is there an easy way to check if a gz or tar file contains single or paired samples? It seems like uncompress to disk to check is a bad idea. Extra I/O time will make run much slower?
I also found this error message in one of the log files
Exception : [
The following errors were detected with the read files
======================================================
ERROR: file [/cromwell_root/dg.4DFC_e18b70c0-bdca-46d3-99bb-c6c8abc68da5/TCGA-2H-A9GF-01A-11R-A37I-31_rnaseq_fastq.tar] has extension .tar, which suggests it is neither a fasta nor a fastq file (or gzip compressed fasta/q).
Is this file compressed in some other way? If so, consider replacing:
/cromwell_root/dg.4DFC_e18b70c0-bdca-46d3-99bb-c6c8abc68da5/TCGA-2H-A9GF-01A-11R-A37I-31_rnaseq_fastq.tar
with
<(decompressor /cromwell_root/dg.4DFC_e18b70c0-bdca-46d3-99bb-c6c8abc68da5/TCGA-2H-A9GF-01A-11R-A37I-31_rnaseq_fastq.tar)
which will decompress the reads "on-the-fly"
]
salmon quant was invoked improperly.
For usage information, try salmon quant --help
Exiting.
+ salmonRet=1
+ echo 'AEDWIP in time salmonRet='
AEDWIP in time salmonRet=
+ '[' 1 -eq 0 ']'
+ echo 'Salmon ERROR code 1'
Salmon ERROR code 1
Kind regards
Andy
you have to extract the files within the tar archive.
...so no, salmon will not read
tar.gz
because it is a compressed folder and not individual files.