Finding genomic location of a list of sequences in human reference genome
3
1
Entering edit mode
9.4 years ago
User6891 ▴ 330

Hi everyone,

I'm trying to find the genomic location of a list of sequences in the reference genome. So if my sequence is 'ACGTACGTAGTCATGC', I want as an output something like this:

chr1  position of the first nucleotide   position of the last nucleotide

Are there any tools to do this? Blastn maybe? But since I have the reference genome locally installed, there might be some other options?

SNP genome • 7.0k views
ADD COMMENT
0
Entering edit mode

Do you only want exact matches? You can install blast locally, btw.

ADD REPLY
0
Entering edit mode

Yes indeed, only exact matches. About blast, indeed we have blast locally installed. But I'm not really experienced with blast. So I was wondering if there are other options.

ADD REPLY
3
Entering edit mode
9.4 years ago

If you only need exact matches, then give one of the answers in this thread a try: Locating A Sequence In A Fasta File.

ADD COMMENT
0
Entering edit mode

If I wanted to use blast:

Do you think 'blastn' would work? With then as -db the human reference genome? My sequences are around 20bp long.

ADD REPLY
1
Entering edit mode

Sure, I'd expect so. That's probably faster once the index is made, though you'll need to build the index first. Of course if you only have to do this a couple times, then using one of the methods in the thread I listed is probably faster than building the blast database and filtering the results.

ADD REPLY
1
Entering edit mode
9.4 years ago
Siva ★ 1.9k

You can also use blat. Since your query sequences are shorter, you might also want to check Using Blat for short sequences with maximum sensitivity

ADD COMMENT
0
Entering edit mode
9.4 years ago
Vova Naumov ▴ 220

An easy way is to use such mapping tool as bowtie. First you will need to index reference file using bowtie-build.

Than you can find location of your sequence using:

bowtie -c <reference.ebwt.prefix>  ACGTACGTAGTCATGC

this will give you coordinates of this sequence.

For more information look at http://bowtie-bio.sourceforge.net/tutorial.shtml

ADD COMMENT

Login before adding your answer.

Traffic: 2629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6