I have been walking through the tutorials at https://www.kallistobus.tools/velocity_tutorial.html and https://www.kallistobus.tools/velocity_index_tutorial.html, and have run into a few issues including new versions of bustools using slightly altered commands than what the tutorial specifies.
Even after untangling some of these issues and getting to a point where I think this should be working, I'm still running into trouble with actually getting count to output the data I would expect.
For reference, I first ran:
kallisto bus -i ../velocity_index/cDNA_introns.idx -o bus_output_06/ -x 10xv2 -t 4 TYR4/TYR4_0_1_HC3HMDSXY/bamtofastq_S1_L004_R1_001.fastq.gz TYR4/TYR4_0_1_HC3HMDSXY/bamtofastq_S1_L004_R2_001.fastq.gz
This produces the output.bus, matrix.ec and transcripts.txt files in the bus_output_06 folders. Then, we have to run
bustools correct -w ../10xv2_whitelist.txt -p output.bus | bustools sort -o output.correct.sort.bus -t 4 -
This yields the output.correct.sort.bus files. Using that, then I run:
bustools capture -s -o cDNA_capture.bus -c cDNA_transcripts.to_capture.txt -e matrix.ec -t transcripts.txt output.correct.sort.bus bustools capture -s -o introns_capture.bus -c introns_transcripts.to_capture.txt -e matrix.ec -t transcripts.txt output.correct.sort.bus
Followed by the count step:
bustools count -o u -g cDNA_introns_t2g.txt -e matrix.ec -t transcripts.txt --genecounts cDNA_capture.bus bustools count -o s -g cDNA_introns_t2g.txt -e matrix.ec -t transcripts.txt --genecounts introns_capture.bus
But I get empty outputs:
-rw-r--r-- 1 ekofman ---- 607K Oct 7 07:51 s.barcodes.txt -rw-r--r-- 1 ekofman ---- 0 Oct 7 07:51 s.genes.txt -rw-r--r-- 1 ekofman ---- 114 Oct 7 07:51 s.mtx drwxr-xr-x 2 ekofman ---- 2 Oct 6 11:33 tmp -rw-r--r-- 1 ekofman ---- 52M Oct 5 17:01 transcripts.txt -rw-r--r-- 1 ekofman ---- 618K Oct 19 12:25 u.barcodes.txt -rw-r--r-- 1 ekofman ---- 0 Oct 19 12:25 u.genes.txt -rw-r--r-- 1 ekofman ---- 114 Oct 19 12:25 u.mtx
So! What could be going wrong? My first thought was maybe the transcripts are mismatched, from the velocity_index_tutorial page.
So let's check:
My cDNA_introns_t2g.txt file looks like this, with 2244276 lines:¶
ENST00000237247 ENSG00000118473 SGIP1 ENST00000237247 ENSG00000118473 SGIP1 ENST00000237247 ENSG00000118473 SGIP1
My matrix.ec looks like this, with 12216809 lines:¶
0 0 1 1 2 2 3 3 4 4
My transcripts.txt file looks like this:
ENST00000237247.1 ENST00000237247.2 ENST00000237247.3 ENST00000237247.4
My cDNA_capture.bus (converted to text, 1663579 lines) looks like this:
AAACCTGAGTTAAGTG CCGAGCTCAA 2467401 1 AAACCTGAGTTAAGTG CGCTTTAGTA 11871347 1 AAACCTGAGTTAAGTG CGGCACGAGG 11781147 1 AAACCTGAGTTAAGTG CGGGTTTCAC 590261 1
I honestly have no clue what is wrong. I intended to use transcripts without version IDs, and I think that the increasing number after the decimal place is supposed to be there, added during the index-making stage as specified by the tutorial. I feel like I'm very close to breaking through on this and finally getting results, but the tutorial being out of date and hard to debug makes it tough! It would be great to hear from anybody else who maybe has dealt with similar issues and might know what to do about this. Thanks!