Question: salmon was only able to assign 0 fragments to transcripts
0
gravatar for naseerkhan861
3 months ago by
naseerkhan8610 wrote:

I am following This tutorial to test the working of salmon on my RNASeq dataset. I am using the RNA-Seq dataset from SRA, for testing I am just using the first run that is "SRR5938419"(about 4.85 GB), so after downloading this file , I used SRAToolkit and using the following ommand, it gave me two fasta files of size(1.45 GB each)

E:\SRAs>fastq-dump --split-files --fasta  60 --gzip E:\SRAs\SRR5938419
Read 36388169 spots for /E/SRAs/SRR5938419
Written 36388169 spots for /E/SRAs/SRR5938419

Then I downloaded the salmon docker as I am in my windows 10 environment using the following command

docker pull combinelab/salmon

So after running the docker for the downloaded image I was able to test if the salmon was working or not so I ran the following commands

root@d6fc32919494:/home# ls
SRAs  salmon-0.14.1  salmon-v0.14.1.tar.gz
root@d6fc32919494:/home# salmon
salmon v0.14.1

Usage:  salmon -h|--help or
        salmon -v|--version or
        salmon -c|--cite or
        salmon [--no-version-check] <COMMAND> [-h | options]

Commands:
     index Create a salmon index
     quant Quantify a sample
     alevin single cell analysis
     swim  Perform super-secret operation
     quantmerge Merge multiple quantifications into a single file
root@d6fc32919494:/home#

then I shared the folder in for my windows host so that I could access fasta files in my docker salmon environment. After that I downloaded the human reference transcriptome using This link and executed the following commands in my docker to save the

root@d6fc32919494:/home# wget ftp://ftp.ensembl.org/pub/release-89/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz -o human.fa.gz
root@d6fc32919494:/home# ls
Homo_sapiens.GRCh38.cdna.all.fa.gz  SRAs  human.fa.gz  salmon-0.14.1  salmon-v0.14.1.tar.gz
root@d6fc32919494:/home#

Then I build the index using the following command.

root@d6fc32919494:/home# salmon index -t human.fa.gz -i human_index
Version Info: Could not resolve upgrade information in the alotted time.
Check for upgrades manually at https://combine-lab.github.io/salmon
index ["human_index"] did not previously exist  . . . creating it
[2019-10-24 09:47:41.438] [jLog] [info] building index
[2019-10-24 09:47:41.449] [jointLog] [info] [Step 1 of 4] : counting k-mers
Elapsed time: 0.0048599s

[2019-10-24 09:47:41.454] [jointLog] [info] Replaced 97776 non-ATCG nucleotides
[2019-10-24 09:47:41.454] [jointLog] [info] Clipped poly-A tails from 0 transcripts
[2019-10-24 09:47:41.454] [jointLog] [info] Building rank-select dictionary and saving to disk
[2019-10-24 09:47:41.454] [jointLog] [info] done
Elapsed time: 3.77e-05s
[2019-10-24 09:47:41.454] [jointLog] [info] Writing sequence data to file . . .
[2019-10-24 09:47:41.454] [jointLog] [info] done
Elapsed time: 6.93e-05s
[2019-10-24 09:47:41.454] [jointLog] [info] Building 32-bit suffix array (length of generalized text is 97845)
[2019-10-24 09:47:41.454] [jointLog] [info] Building suffix array . . .
success
saving to disk . . . done
Elapsed time: 0.0001873s
done
Elapsed time: 0.0101487s
processed 0 positions[2019-10-24 09:47:41.490] [jointLog] [info] khash had 97814 keys
[2019-10-24 09:47:41.490] [jointLog] [info] saving hash to disk . . .
[2019-10-24 09:47:41.496] [jointLog] [info] done
Elapsed time: 0.0052422s
[2019-10-24 09:47:41.496] [jLog] [info] done building index
root@d6fc32919494:/home# ls
Homo_sapiens.GRCh38.cdna.all.fa.gz  SRAs  human.fa.gz  human_index  salmon-0.14.1  salmon-v0.14.1.tar.gz
root@d6fc32919494:/home#

After that I ran the salmon with the created human_index on my fasta files using the following command

root@d6fc32919494:/home# salmon quant -i human_index -l A -1 SRAs/SRR5938419_1.fasta.gz  -2 SRAs/SRR5938419_2.fasta.gz -p 8 --validateMappings -o quants/SRR5938419_qunats
Version Info: Could not resolve upgrade information in the alotted time.
Check for upgrades manually at https://combine-lab.github.io/salmon
### salmon (mapping-based) v0.14.1
### [ program ] => salmon
### [ command ] => quant
### [ index ] => { human_index }
### [ libType ] => { A }
### [ mates1 ] => { SRAs/SRR5938419_1.fasta.gz }
### [ mates2 ] => { SRAs/SRR5938419_2.fasta.gz }
### [ threads ] => { 8 }
### [ validateMappings ] => { }
### [ output ] => { quants/SRR5938419_qunats }
Logs will be written to quants/SRR5938419_qunats/logs
[2019-10-24 09:51:04.203] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
[2019-10-24 09:51:04.203] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2019-10-24 09:51:04.203] [jointLog] [info] Usage of --validateMappings, without --hardFilter implies use of range factorization. rangeFactorizationBins is being set to 4
[2019-10-24 09:51:04.203] [jointLog] [info] Usage of --validateMappings implies a default consensus slack of 0.2. Setting consensusSlack to 0.2.
[2019-10-24 09:51:04.203] [jointLog] [info] parsing read library format
[2019-10-24 09:51:04.203] [jointLog] [info] There is 1 library.
[2019-10-24 09:51:04.237] [jointLog] [info] Loading Quasi index
[2019-10-24 09:51:04.237] [jointLog] [info] Loading 32-bit quasi index
[2019-10-24 09:51:04.242] [jointLog] [info] done
[2019-10-24 09:51:04.242] [jointLog] [info] Index contained 1 targets
[2019-10-24 09:51:04.237] [stderrLog] [info] Loading Suffix Array
[2019-10-24 09:51:04.237] [stderrLog] [info] Loading Transcript Info
[2019-10-24 09:51:04.237] [stderrLog] [info] Loading Rank-Select Bit Array
[2019-10-24 09:51:04.237] [stderrLog] [info] There were 1 set bits in the bit array
[2019-10-24 09:51:04.237] [stderrLog] [info] Computing transcript lengths
[2019-10-24 09:51:04.237] [stderrLog] [info] Waiting to finish loading hash
[2019-10-24 09:51:04.242] [stderrLog] [info] Done loading index

I got the following message after few minutes

processed 36000000 fragments
hits: 0, hits per frag:  0[2019-10-24 09:55:35.561] [jointLog] [warning] salmon was only able to assign 0 fragments to transcripts in the index, but the minimum number of required assigned fragments (--minAssignedFrags) was 10. This could be indicative of a mismatch between the reference and sample, or a very bad sample.  You can change the --minAssignedFrags parameter to force salmon to quantify with fewer assigned fragments (must have at least 1).
root@d6fc32919494:/home#

So I did not find and TSV file as was claimed in the tutorial link as I mentioned in the first line of post , instead I got following information in the file

root@d6fc32919494:/home# ls
Homo_sapiens.GRCh38.cdna.all.fa.gz  SRAs  human.fa.gz  human_index  quants  salmon-0.14.1  salmon-v0.14.1.tar.gz
root@d6fc32919494:/home# cd quants
root@d6fc32919494:/home/quants# ls
SRR5938419_qunats
root@d6fc32919494:/home/quants# cd SRR5938419_qunats/
root@d6fc32919494:/home/quants/SRR5938419_qunats# ls
aux_info  cmd_info.json  libParams  logs  quant.sf
root@d6fc32919494:/home/quants/SRR5938419_qunats# cat quant.sf
Name    Length  EffectiveLength TPM     NumReads
        97844   97844.000       0.000000        0.000
root@d6fc32919494:/home/quants/SRR5938419_qunats#

Now after all this effort, please somebody tell me what is the problem, why I am not getting the TPMs or counts and what went wrong, I would be extremely humbled if somebody could guide me the way out of this problem.

Regards

rna-seq docker salmon windows10 • 222 views
ADD COMMENTlink modified 3 months ago by yztxwd290 • written 3 months ago by naseerkhan8610

It may not solve your question, but it will make all bioinformatics a lot easier if you can work on a Linux machine. WSL is pretty good, but as far as I know it doesn't perfectly substitute a real Linux environment.

ADD REPLYlink written 3 months ago by WouterDeCoster42k

I will check WSL and will run all those commands there, but you can see, I was not having any issues related to downloading or installation or running of some package

ADD REPLYlink written 3 months ago by naseerkhan8610
4
gravatar for yztxwd
3 months ago by
yztxwd290
Southern Medical University
yztxwd290 wrote:

In wget command, -o is used to specify the file to record the log messages, so you should use Homo_sapiens.GRCh38.cdna.all.fa.gz to index not human.fa.gz, human.fa.gz only contains log messages instead of fasta sequence.

ADD COMMENTlink modified 3 months ago by ATpoint28k • written 3 months ago by yztxwd290
1

Building a suffix array for the human transcriptome in 0.0001873s and the whole index within 0.0052422s would make me suspicious even on a dedicated workstation.

ADD REPLYlink written 3 months ago by michael.ante3.6k
1

Agreed. You are simply using the wrong reference file. Index the actual Homo_sapiens.GRCh38.cdna.all.fa.gz and everything should be fine.

ADD REPLYlink written 3 months ago by ATpoint28k

Thanks as you are always very helpful.

ADD REPLYlink written 3 months ago by naseerkhan8610

Thank you very much Indeed!

ADD REPLYlink written 3 months ago by naseerkhan8610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2084 users visited in the last hour