Mapping cDNA sequence of a single gene to a genome not hosted at UCSC server (without Blat)
2
0
Entering edit mode
4 weeks ago
rahu • 0

I want to map a FASTA sequence that spans over introns against the genome in order to get its coordinates in the GTF format. I'm using a genome which is not hosted at UCSC server so I cannot use Blat for this purpose. I'm trying to turn the cDNA sequence of the gene I cloned into GTF format so that I could manually add it to the gene annotation file. Blat allows you to get psl output which, I believe somehow could be turned into GTF format. Is there a mapper with the similar output (psl) and input (FASTA) requirements?

I tried to use mappers like HISAT2 and STAR but I did not achieve my goal. The cDNA sequence (FASTA) derived from my gene of interest is found in tandem several times in the same locus in the genome. So, HISAT2 reports only the best single mapping which doesn't really help. Is there a mapper that could map a FASTA file with gaps to solve this problem?

mapping • 765 views
ADD COMMENT
2
Entering edit mode
4 weeks ago

My favourite for this task would be GMAP, which can produce GFF format with option -f 2.

It is also available in https://anaconda.org/bioconda/gmap.

There's a manual at http://research-pub.gene.com/gmap/src/README. In a nutshell, if you are mapping a sequence against a genome you need to index it first; if your reference is shorter you can align directly.

ADD COMMENT
0
Entering edit mode

Thank you for bringing it to my attention! I wasn't aware of GMAP. It seems to be similar to Blat with some proposed advantages, looks very useful.

I appreciate your brief summary.

ADD REPLY
1
Entering edit mode
4 weeks ago

server so I cannot use Blat for this purpose.

blat is avilable as a standane tool: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/blat/

ADD COMMENT
0
Entering edit mode

Thanks for pointing it out. I had problems with visualizing the final GTF file on IGV. I ran the following codes from BEDOPS and UCSC utilities.

convert2bed --input=psl --output=bed  < example.psl > example.bed
bedToGenePred example.bed example.genepred
example.genepred | cut -f1-10 | genePredToGtf file stdin example.gtf

Then I merged the gene annotation file with my cDNA in GTF format as follows:

cat annotation.gtf example.gtf > annotation_new.gtf

However, when I checked on IGV, the gene doesn't appear on its specified locus. What could have gone wrong?

ADD REPLY

Login before adding your answer.

Traffic: 1657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6