[homework] I was given multiple contigs in order to organize them and don't know what to do
1
0
Entering edit mode
9.2 years ago
carerjose • 0

The title says it all, I was given multiples contigs with high E values in order to organize them and I just simply don't know where to start.

contigs sequence dna blast • 2.3k views
ADD COMMENT
2
Entering edit mode

What Geek_y said. Also, please define "organize". And high e-values are not worth looking at, low e-values are.

ADD REPLY
0
Entering edit mode

Homework? Do you have a reference genome of closely related species?

ADD REPLY
0
Entering edit mode

Yes a homework. I was just given multiple contigs and I have to search proteins that have those contigs and organize it in order.

ADD REPLY
0
Entering edit mode

You mean proteins that contain your translated contigs? Use BLAST (try and figure out the variant of BLAST and database that fit your case).

Again, "organize in order" is really vague. You're looking to organize in a specific order by the values of a specific attribute, such as match-score, highest to lowest or e-value, lowest to highest. Each attribute has its own quirks (Example: e-values are database-specific).

You'll need to define your problem statement better before trying to solve it.

ADD REPLY
0
Entering edit mode

why don't you just post like 2 contigs

ADD REPLY
0
Entering edit mode

I wonder why OP calls them contigs and not just sequences in the first place. Are these assembled sequences that have also undergone a BLAST search (e-values)? Who worked upstream from OP and why is a beginner being given a task that doesn't have a well defined start and end point? Too many questions, too few answers!

ADD REPLY
0
Entering edit mode
9.2 years ago
carerjose • 0

UPDATE: I just asked a graduate student in the lab.

He told me:

  1. The objective is to compare the contigs as I align them with a sequence of a protein. The more similar, the better the probability to be real.
  2. If there is a group of contigs that are similar, I have to group them and compare them.
  3. Then compare them with a comparative sequence and the more similar, the better the probability to be real.
  4. He suggested Ugene.
  5. I remember that the PI told me that since the e values are elevated, they are not real contigs.
ADD COMMENT
0
Entering edit mode

How did you get the e-values in the first place? And CD-HIT is a good program to cluster sequences by similarity.

ADD REPLY
0
Entering edit mode

I don't know. I think those sequences came from a PCR then he put those in blastx and got those e-values then gave it to me to see what is wrong.

ADD REPLY
0
Entering edit mode

Given that a BLASTX is done already and e-values are high, what does he wish for you to whip up? Your run of BLAST won't result in any significant results if his did not.

ADD REPLY
0
Entering edit mode

To determine which contig is real.

ADD REPLY
0
Entering edit mode

It's a long shot, but the one with the least e-value has the highest chance of being close to real. Think of it as a king among paupers though, e-value > 1e-3 is rarely real, especially if you're comparing to ncbi-nr.

ADD REPLY
0
Entering edit mode

Well all the given contigs in my homework were e-value > 1e-7. So what do you recommend to do with them?

Is there a way to assembly them in order to have low e-values or something.

ADD REPLY
0
Entering edit mode

1e-7 < 1e-3. 1e-m = 1 * 10^-m, which mean 1e-3 is 0.001 and 1e-7 is 0.0000001

1e-7 is a respectable e-value, so you're better off picking the one with the least e-value.

Wait, so when you first said high e-values, you were looking at the 1e-X part? If so, the e-values are low, not high.

ADD REPLY

Login before adding your answer.

Traffic: 1655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6