Question: BRIG: How to use a multi fasta file as query
0
gravatar for benoit.kunath
2.5 years ago by
benoit.kunath10 wrote:

Hello everybody.

I try to compare assembled genomes (contigs in a multi fasta file) against a reference genome (complete genome in 1 fasta entry). I thought to use BRIG but I don't have any results. When I give the multi fasta file as the query, nothing is printed on the picture. I only get the reference and nothing else.

There is no information about using a multi fasta file as query in the BRIG manual. Does anyone have an idea how I can make it work? Is it doable? Or should I use another software?

Thanks a lot,

Regards, Ben

alignment software error • 2.1k views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by benoit.kunath10
0
gravatar for Joe
2.5 years ago by
Joe16k
United Kingdom
Joe16k wrote:

This is a known issue with later versions of java and BRIG, and the program hasn't been patched.

You can still use a multifasta, but you'll need to delete all the intermediate headers. It should work then.

ADD COMMENTlink written 2.5 years ago by Joe16k
0
gravatar for benoit.kunath
2.5 years ago by
benoit.kunath10 wrote:

Hello!

Thanks a lot! It works perfectly fine now.

I might have another question in the same idea.

I try to use a multi fasta file as a reference (it's the concatenation of genes of resistance for different strains of a bacteria) And I would like to map my different strains against that multi fasta file to see which strain have which of the genes.

I've tried 2 things so far: 1) I gave multi fasta file as query 2) I gave the full genome as query.

But none of them work. Is there another trick here too?

Thanks a lot! Best, Ben

ADD COMMENTlink written 2.5 years ago by benoit.kunath10
1

Are you making sure that BRIG regenerates new BLAST files every time?

You basically want to use it to show presence/absence of these genes? This may work, I've not experimented with the software extensively. You would need to use the concatenated mutltifasta of genes as your reference, as you have, and then add a new genome for each ring outwards. You'll need to re-run the BLAST process each time (and delete the contents of the scratch folder that is generated). You should get a sequence match for each genome that contains that region at your specified cut off, since BRIG does not preserve co-ordinates in the outer rings.

If that doesn't help, I think we need more information and I would suggest making a new question.

ADD REPLYlink written 2.5 years ago by Joe16k
0
gravatar for benoit.kunath
2.5 years ago by
benoit.kunath10 wrote:

I didn't read anything about re-running the BLAST process and delete the content of the scratch folder.

I've done it. It works almost perfectly (still have to play around with the annotations though, they're not all printed)

How does the software remember the match if I delete the content of the Scratch file everytime?

Thank you very much for the help!!!

ADD COMMENTlink written 2.5 years ago by benoit.kunath10

Don't add answers to your original post to ask for further clarification/new questions - add them as comments under the original post.

You need to re-run blast each time you add new information (either a reference sequence or a new query sequence) else how will the software know where to draw the similarities for the new data.

You can think of BRIG in 2 parts; BRIG itself is just a rendering tool that turns a BLAST tabular output for each query and reference pair in to an image depicting those regions which match to your thresholds. In order for BRIG to have access to this information though, you must re-run BLAST every time your input data changes as the BLAST tabular file will also change (and there needs to be one per pair of sequences to compare).

If you want to understand more, the scratch folder contains each of the blast tabular files with the name in the format queryVreference.tab

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Joe16k

OK. Thanks.

I have one last issue: So I have my multi fasta file as a reference. That works. Now, as queries I use different strain genomes. Some of them is a multi fasta file, so I concatenated the reads as explained before and it works nicely. One of the genome however, is the reference one, and is not a multi fasta file (it's a normal single fasta).

If I use the strain with the concatenated reads as queries it works. If I use the reference single fasta file as query, it works. But When I want to have both on the same picture, it doesn't work. Even though they look exactly the same since one is the single fasta file and the other is a concatenated multi fasta file so it looks like a single fasta.

Can it be the way the software uses Blast that makes it behaving like that? The very weird part is that, if I put the concatenated multi fasta for the first ring, it prints it but not the other. But If I put the single fasta file as first ring, it prints it but not the other.... It doesn¨'t make much sense too me.

Thanks a lot again for your help!

Regards

ADD REPLYlink written 2.5 years ago by benoit.kunath10

Am I understanding this correctly, that you tried to use them both as a reference sequences?

If so, that won’t work, you can only use one reference sequence at a time. You could add one or both of them as query sequences in separate rings though.

This is an image I made a while back. You can see the inner red ring is completely solid. This is the reference sequence added as a query ring (you can think of the ACTUAL reference sequence as the black inner ring with the tick marks on.

Consequently, they have 100% blast identity to one another (obviously), and the ring is solid. Any subsequent sequences are added as query rings as well.

enter image description here

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Joe16k

Sorry my mistake.

What I meant with reference, is that I use the reference strain (the one that has the full genome sequenced and published and so the file is in 1 line) and then some strains I sequenced myself and so I have multi fasta files that contain my contigs.

So the reference I give to the software is a multi fasta file containing multiple resistance genes from different bacteria. And I want to compare the presence and absence of these genes among the official sequenced genome and the strains I have sequenced my self. So as query, I give the official genome (1 line fasta) and my sequenced strains (multi fasta where I concatenated the reads so It actually looks like a 1 line fasta)

But if I use only my strains it works. If I use only the official genome it works. But when I use them both (as query) at the same time (different ring) it doesn't print all of them. And I wonder if it's because some were originally contigs and the other was a real 1 line fasta.

It seems weird, but that's the only difference I know between those files

ADD REPLYlink written 2.5 years ago by benoit.kunath10

Hmm... I'm not sure why they wouldn't both work as separate rings.

Can you share your input data and we can try to replicate the problem?

Can you check the blasttab files for both sets of sequences versus your resistance gene reference to make sure they both contain data? If one of them contains no blast hits, there will be nothing to render.

Additionall (though I doubt this is the problem if you only have a couple of rings), you may need to adjust the rendering window size to ensure everything displays correctly.

ADD REPLYlink written 2.5 years ago by Joe16k

Hello.

Since the only difference between the files was that one was a 1 line fasta file and the other was a multi fasta file where I deleted the intermediate header, I thought about something: I downloaded the genes from that 1 line fasta file, so I obtained a multi fasta file. I then deleted the the header and use that as a query (so now it has been through the same process than the other files). DOn't ask me why, but with that trick, it works very nicely.

Thanks a lot for your help!!

ADD REPLYlink written 2.5 years ago by benoit.kunath10

Was one of your references a single continuous sequence on a single line, and the other a continuous sequence but broken/wrapped over many lines?

It may be that BRIG isn’t happy with having an un-wrapped sequence.

ADD REPLYlink written 2.5 years ago by Joe16k
1

Yes it was like that. I guess that's the reason.

Thanks a lot again for your help!!

ADD REPLYlink written 2.5 years ago by benoit.kunath10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 770 users visited in the last hour