Mapping a list of sequences to gene symbol
Entering edit mode
3.5 years ago


I have a list of sequences majority of which map to human genome. I want to map them onto the genome, and obtain gene symbol when the sequence falls within a gene. I already know of an R solution which seems very slow on running, a solution based on Blat which I honestly dont know and dont have the time to learn and then parse the results( I am not even much familiar with linux).A third solution that comes to my mind is to feed RNAseq analysis tools with my fasta files and see the result. First question, is that whether this approach will work (my seqs are all 60 nt)? The second question is that the results from my R code (that are verified using web version of blat) contain symbols that are not returned by either Hisat2|stringtie or Hisat2|HtseqCount or Salmon (I run them on galaxy) is that some requirements of hisat2/salmon are not met by my dataset or because I dont know them, although running them on Galaxy is a piece of cake. I can give example of a couple of sequences that are not mapped using tools at galaxy but are mapped using R.

Hisat2 salmon HTSeqCount StringTie • 531 views
Entering edit mode

Command line blast/blat is your best option here. RNAseq tools might work but you need something more robust than fast quantification since you have actual sequences and not reads.


Login before adding your answer.

Traffic: 1419 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6