Entering edit mode
2.3 years ago
bioinformatics
▴
40
Hi,
I'm trying to map reads using kallisto for rna seq analysis (terminal on mac) and keep getting an error message:
Error: FASTA file not found Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz
kallisto 0.48.0
Builds a kallisto index
Usage: kallisto index [arguments] FASTA-files
Required argument:
-i, --index=STRING Filename for the kallisto index to be constructed
Optional argument:
-k, --kmer-size=INT k-mer (odd) length (default: 31, max value: 31)
--make-unique Replace repeated target names with unique names
admins-Air:~ mesalie$ kallisto quant \
> -i Homo_sapiens.GRCh38.cdna.all.index \
> -o test \
> -t 8 \
> --single -l 250 -s 30 \
> SRR8668755_1M_subsample.fastq.gz
Error: kallisto index file not found Homo_sapiens.GRCh38.cdna.all.index
Warning: you asked for 8, but only 4 cores on the machine
Usage: kallisto quant [arguments] FASTQ-files
Required arguments:
-i, --index=STRING Filename for the kallisto index to be used for
quantification
-o, --output-dir=STRING Directory to write output to
Optional arguments:
--bias Perform sequence based bias correction
-b, --bootstrap-samples=INT Number of bootstrap samples (default: 0)
--seed=INT Seed for the bootstrap sampling (default: 42)
--plaintext Output plaintext instead of HDF5
--fusion Search for fusions for Pizzly
--single Quantify single-end reads
--single-overhang Include reads where unobserved rest of fragment is
predicted to lie outside a transcript
--fr-stranded Strand specific reads, first read forward
--rf-stranded Strand specific reads, first read reverse
-l, --fragment-length=DOUBLE Estimated average fragment length
-s, --sd=DOUBLE Estimated standard deviation of fragment length
(default: -l, -s values are estimated from paired
end data, but are required when using --single)
-t, --threads=INT Number of threads to use (default: 1)
--pseudobam Save pseudoalignments to transcriptome to BAM file
--genomebam Project pseudoalignments to genome sorted BAM file
-g, --gtf GTF file for transcriptome information
(required for --genomebam)
-c, --chromosomes Tab separated file with chromosome names and lengths
(optional for --genomebam, but recommended)
--verbose Print out progress information every 1M proccessed reads
The commands I used are listed below:
bash ~/Miniconda3-latest-MacOSX-x86_64.sh -b -p $HOME/miniconda
source $HOME/miniconda/bin/activate
conda init zsh
conda info
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set offline false
conda create --name rnaseq
conda activate rnaseq
conda install -c bioconda kallisto
kallisto
conda install -c bioconda fastqc
conda install -c bioconda multiqc
conda activate rna seq
kallisto index -i Homo_sapiens.GRCh38.cdna.all.index Homo_sapiens.GRCh38.cdna.all.fa
kallisto quant \
-i Homo_sapiens.GRCh38.cdna.all.index \
-o test \
-t 8 \
--single -l 250 -s 30 \
SRR8668755_1M_subsample.fastq.gz
Does anyone know how I might be able to correct this?
Thankyou
give the correct path of
Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz
tokallisto index
Thankyou for your response. How exactly do I do this?
Should the command be
kallisto index -i Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.index Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa
It gives an error message: Error: FASTA file not found Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa
Make sure
Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa
is in your working directory (or give the absolute path to be safe), and that the extension of the file is actually.fa.gz.fa
as written in your code.Ok thanks, I have now done this.
I still get an error message: [quant] fragment length distribution is truncated gaussian with mean = 250, sd = 30 Error: incompatible indices. Found version 3472328451435676990, expected version 10 Rerun with index to regenerate%
Do you know how I might correct this?
You will need to regenerate the index with the
kallisto index
command.Ok thankyou. If it takes a long time to load what should I do? I have tried on two macs
It might take 10 - 20 minutes. Use a system monitor utility to watch out that you don't run out of memory. Good luck