Question

Help for finding the right FASTA file for kallisto

1

Entering edit mode

7.0 years ago

swamyvinny ▴ 20

Hi I am an undergrad who recently started working in a lab and pretty new to this so sorry if I sound like I have no idea what I'm talking about. I've been tasked with using kallisto to quantify transcript abundance from our RNAseq data(human). The reference fasta files I've been using that I found on ensembl (ftp://ftp.ensembl.org/pub/release-88/fasta/homo_sapiens/cdna/) all have multiple transcriptional variants for each gene, so kallisto then calculates the abundance of each variant gene, but my PI wants the abundance for each gene as a whole, having all the variants falling under a single gene, so I was wondering if anyone knows where I can get a human exome fasta file with a single sequence for each gene. My PI says he was able to get the abundance per gene with the old software he was using(partek genomic suite), so I feel like it should be possible. If there is another program I should use or a better method, would love to hear it. TLDR looking fasta for human exome without transcriptional variants

Thanks in advance for any help

RNA-Seq kallisto FASTA • 3.6k views

ADD COMMENT • link updated 7.0 years ago by Sreeraj Thamban ▴ 290 • written 7.0 years ago by swamyvinny ▴ 20

score 2 · Answer 1 · 2017-05-07

2

Entering edit mode

7.0 years ago

h.mon 35k

Use tximport to summarize the transcript-level estimates to gene level.

ADD COMMENT • link 7.0 years ago by h.mon 35k

score 2 · Answer 2 · 2017-05-08

Hi swamyvinny,

I am not aware if and how this is possible with kallisto, but it is relatively simple with salmon (very similar tool). If you provide a .gtf or .tabular file to salmon, where you map each transcript to a gene, salmon will provide you not only counts for each transcript, but it will also summarize the counts for each gene automatically. I would not recommend using a fasta file with a single sequence for each gene, since taking only one transcript per gene will result in a big loss of information. If you wish to only measure gene abundance you can also align your samples and use a tool like featureCounts to get gene counts, but it will demand more computational resources and evidence suggest, that it will also be less accurate.

Hope this helps!

Stefan

score 1 · Answer 3 · 2017-05-09

1

Entering edit mode

7.0 years ago

Sreeraj Thamban ▴ 290

I agree with h.mon, I am using tximport after kallisto to get the summarized counts for each gene. You can use tximport output for downstream DEG analysis also. https://bioconductor.org/packages/release/bioc/html/tximport.html Thanks

ADD COMMENT • link 7.0 years ago by Sreeraj Thamban ▴ 290