Question: Problem finding sequence in mouse dna despite blast finds it
gravatar for juanjo75es
6 days ago by
juanjo75es10 wrote:

I downloaded an alignment of 35 mammals. I selected a random fragment. In the case of mouse it corresponded to this section:


When I look for that sequence (just the first line of it) using Ensembl BLAST it finds it (despite not in that position) but when I download the data for that region (whether I use ensembl of ncbi) the sequence does not correspond. I then downloaded the full chromosome 2 for the mouse reference genome, then made a search of that sequence and it doesn't appear anywhere. Not even close. What am I missing?

I need that because I want to extract an alignment for that section from other species. Indeed I tried to find a local alignment for the full sequence but the result was terrible. Then I tried with the mouse and same happened. Then I realized that indeed that sequence used in the 35 mammals alignment apparently does not exist in the mouse gemome, despite blast also finds it ... I am lost. Any help would be appreciated.

alignment sequence • 91 views
ADD COMMENTlink modified 6 days ago by h.mon26k • written 6 days ago by juanjo75es10

This sequence do exist in the mouse genome:

Mus musculus strain C57BL/6J chromosome 2, GRCm38.p4 C57BL/6J
Sequence ID: NC_000068.7    Length: 182113224   Number of Matches: 1
Range 1: 9127626 to 9127689
Alignment statistics for match #1 Score Expect  Identities  Gaps    Strand
119 bits(64)    1e-25   64/64(100%)     0/64(0%)    Plus/Minus


Query  61       GGAT  64
Sbjct  9127629  GGAT  9127626

then made a search of that sequence and it doesn't appear anywhere. Not even close.

How did you search the sequence?

What am I missing?

How To Ask Good Questions On Technical And Scientific Forums

ADD REPLYlink written 6 days ago by h.mon26k

Hi h.mon As I said, I already found that sequence using BLAST. That same screen that you posted. The problem is when I try to download that sequence and the surrounding area. This sequence is not what I get if use these positions as parameters in the ensembl browser. What I get is that:

>2 dna:chromosome chromosome:GRCm38:2:9127626:9127689:1

As I also said, I downloaded the full chromosome 2 sequence from an ftp site (indeed two versions from two ftp sites) and made a search using a text editor on it and that sequence does not appear. Sorry if I am not accurate with the terminology, I am a newbie on bioinformatics, but I already have large experience on asking technical questions on other technical fields. I don't see any problem in the question but if there is one please share your impressions.

ADD REPLYlink modified 6 days ago by h.mon26k • written 6 days ago by juanjo75es10

Just be more detailed. First and foremost, you should have showed the sequence you found.

But you also should have said from the beginning how you downloaded the particular region (using Ensembl BioMart? etc), how you searched for the sequence, if you used used local blast, or NCBI (or Ensembl) blast server, and so on. Generally speaking, it is also a good idea to paste the exact commands you used.

When you do this, people have more information and is able to provide more detailed, higher quality suggestions. For example, although very tempting (I myself do this), searching for a pattern in a fasta file is not advisable, because line wrapping can result in false negatives. I would have advised you to perform a local blast search against the downloaded chr2, or to use BBDuk from the BBTools package: in=chr2.fasta literal=ATCCTACTTATCCAAGTACTAACAATAACTAAATTTAAATTTTTAATGTATTTATCCAAAATAA

Both programs handle searching in both strands, so one finds patterns on the opposite strand, which are not automatically searched when using an text editor.

ADD REPLYlink modified 6 days ago • written 6 days ago by h.mon26k
gravatar for h.mon
6 days ago by
h.mon26k wrote:

The sequence you are interested is there, it is just on the opposite strand, you have to reverse-complement it:

The Sequence Manipulation Suite: Reverse Complement
Results for 64 residue sequence "2 dna:chromosome chromosome:GRCm38:2:9127626:9127689:1" starting "ATCCTACTTA".

ADD COMMENTlink modified 6 days ago • written 6 days ago by h.mon26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1771 users visited in the last hour