Question: Help for finding the right FASTA file for kallisto
1
gravatar for swamyvinny
2.4 years ago by
swamyvinny20
swamyvinny20 wrote:

Hi I am an undergrad who recently started working in a lab and pretty new to this so sorry if I sound like I have no idea what I'm talking about. I've been tasked with using kallisto to quantify transcript abundance from our RNAseq data(human). The reference fasta files I've been using that I found on ensembl (ftp://ftp.ensembl.org/pub/release-88/fasta/homo_sapiens/cdna/) all have multiple transcriptional variants for each gene, so kallisto then calculates the abundance of each variant gene, but my PI wants the abundance for each gene as a whole, having all the variants falling under a single gene, so I was wondering if anyone knows where I can get a human exome fasta file with a single sequence for each gene. My PI says he was able to get the abundance per gene with the old software he was using(partek genomic suite), so I feel like it should be possible. If there is another program I should use or a better method, would love to hear it. TLDR looking fasta for human exome without transcriptional variants

Thanks in advance for any help

rna-seq kallisto fasta • 2.2k views
ADD COMMENTlink modified 2.4 years ago by Sreeraj Thamban140 • written 2.4 years ago by swamyvinny20
2
gravatar for h.mon
2.4 years ago by
h.mon27k
Brazil
h.mon27k wrote:

Use tximport to summarize the transcript-level estimates to gene level.

ADD COMMENTlink written 2.4 years ago by h.mon27k
2
gravatar for stefanos.bamopoulos
2.4 years ago by
stefanos.bamopoulos30 wrote:

Hi swamyvinny,

I am not aware if and how this is possible with kallisto, but it is relatively simple with salmon (very similar tool). If you provide a .gtf or .tabular file to salmon, where you map each transcript to a gene, salmon will provide you not only counts for each transcript, but it will also summarize the counts for each gene automatically. I would not recommend using a fasta file with a single sequence for each gene, since taking only one transcript per gene will result in a big loss of information. If you wish to only measure gene abundance you can also align your samples and use a tool like featureCounts to get gene counts, but it will demand more computational resources and evidence suggest, that it will also be less accurate.

Hope this helps!

Stefan

ADD COMMENTlink written 2.4 years ago by stefanos.bamopoulos30
1
gravatar for Sreeraj Thamban
2.4 years ago by
Indian Institute of Science Education and Research
Sreeraj Thamban140 wrote:

I agree with h.mon, I am using tximport after kallisto to get the summarized counts for each gene. You can use tximport output for downstream DEG analysis also. https://bioconductor.org/packages/release/bioc/html/tximport.html Thanks

ADD COMMENTlink written 2.4 years ago by Sreeraj Thamban140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2245 users visited in the last hour