Entering edit mode
7.3 years ago
1769mkc
★
1.2k
How to build a transcriptome index for kalisto is it from the normal hg19 or hg 38 ?the way we build for tophat protocol ?
Have you read the manual?
yes its saying target sequence so i have that confusion the target sequence is my reference file or my read file pardon me for this trivial doubt but Im new to all these stuffs so do guide I plan to use this for my rna seq data from HL60 ,human cell lines.
The target sequence is supposed to be a fasta transcriptome reference. Note that you can also download common transcriptome indexes.
Thank you i have downloaded and Im building the index now
so can I use my own transcriptome reference im using hg19 so i have created the fasta file ,now I want to use that to build the index but it seems it taking for ever i started two hours ago its stuck at kmer sequence does it take so much time or im doing something wrong in the index building im giving my input as .fa file not .gz file so is that an issue?
You downloaded transcriptome fasta and are using that for building the index? Might take a while indeed, don't remember how long it took for me. (Punctuation would make your question easier to read.)
yes i have " downloaded transcriptome fasta and are using that for building the index? " yes i did that for test but I want to have my own transcriptome index which would be from hg19 so would that work?
If you used an fasta file containing all transcripts of interest for building the index, sure, that would work.
im using this http://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/ to download al the files and concatenated it into a single fasta file , so can I use that for Kalisto index building? i did try its like stuck for more than an hour .So am I doing the right thing? do give your suggestion and have a look at the link
You are downloading entire chromosomes, the genome. That's not the same as the transcriptome. An example would be:
ftp://ftp.ensembl.org/pub/grch37/release-86/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.cdna.all.fa.gz
(from Ensembl)but for tophat I have used the whole chromosome to align rna seq data.So am I doing it wrong?
Indeed, Tophat uses the genome for alignment while kallisto uses the transcriptome for pseudomapping and counting..
thank you for the clarification ...
I got the same problem, and in my case, the problem was that there was a corrupted fastq file. The kallisto author is aware of this problem (in fact, I sent him my files and he discover the reason), and is trying to fix it. You can test if your file is corrupted by running a zcat with your fastq compressed files