Question

Mapping reads using kallisto - rna seq analysis

0

Entering edit mode

21 months ago

bioinformatics ▴ 40

Hi,

I'm trying to map reads to a reference genome using kallisto for rna seq analysis with terminal on mac and the following command keeps loading for hours and won't run. I'm not exactly sure where I've gone wrong.

kallisto index -i Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.index Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa

[build] loading fasta file Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 1554 target sequences
[build] warning: replaced 100005 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...

The commands I used are listed below:

bash ~/Miniconda3-latest-MacOSX-x86_64.sh -b -p $HOME/miniconda
source $HOME/miniconda/bin/activate
conda init zsh
conda info
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set offline false
conda create --name rnaseq 
conda activate rnaseq 
conda install -c bioconda kallisto
kallisto
conda install -c bioconda fastqc
conda install -c bioconda multiqc
conda activate rnaseq 

kallisto index -i Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.index 
Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa

Thankyou

rnaseq minconda analysis • 1.4k views

ADD COMMENT • link 21 months ago by bioinformatics ▴ 40

0

Entering edit mode

Doesn't sound normal to me. It took me 4 or less minutes to build an index from this file.

kallisto index -i hma.idx Homo_sapiens.GRCh38.cds.all.fa.gz

[build] loading fasta file Homo_sapiens.GRCh38.cds.all.fa.gz
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 4 target sequences
[build] warning: replaced 111664 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 261396 contigs and contains 35343983 k-mers

I run this command on WSL2 with 32 GB of RAM and 8 core.

ps. I have installed kallisto with conda

ADD REPLY • link 21 months ago by andres.firrincieli 3.6k

0

Entering edit mode

Ok thanks for your help it is not loading and I have tried on two different macs.

I have also tried the command you have provided.

ADD REPLY • link 21 months ago by bioinformatics ▴ 40

0

Entering edit mode

The extension on your fasta file looks strange:

Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.fa

Try removing the ".fa" on the end; the file should then be

Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz

Based on the extension, kallisto may be trying to process what it "thinks" is an uncompressed fasta, which could be causing errors. If that doesn't work, check that Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz isn't corrupted, check that it's truly a fasta file, check that it's truly gzipped, etc.

ADD REPLY • link 21 months ago by kalavattam ▴ 190

0

Entering edit mode

The command has worked on my mac previously, so the file may not be corrupt. However, I tried to build the index again and it didn't work.

I tried to remove the ".fa" on the end and it still doesn't run.

ADD REPLY • link 21 months ago by bioinformatics ▴ 40

0

Entering edit mode

It loaded! Thanks all for your help.

ADD REPLY • link 21 months ago by bioinformatics ▴ 40

0

Entering edit mode

In case someone has the same problem in the future, what was the solution?

ADD REPLY • link 21 months ago by kalavattam ▴ 190

0

Entering edit mode

I'm not exactly sure, I used a different mac computer and it loaded in a couple of minutes. The disk on the previous laptop might be full.

I also used the following code to build the index: kallisto index -i Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz.index Homo_sapiens.GRCh38.cdna.all.HBmain.fa.gz

I removed the extra 'fa'

ADD REPLY • link 21 months ago by bioinformatics ▴ 40