Question: Sorting BLAST output files together?
1
gravatar for zgayk
3.1 years ago by
zgayk90
United States
zgayk90 wrote:

Hi,

I have five BLASTn tabular files that resulted from querying the same large gene list (same query) against a different subject genome database for each resulting file. The goal was to potentially identify possible orthologous sequences between the subject gene list and the 5 different match genomes.

I was able to identify most, if not all of the same genes between each genome.

Now I would like to concatenate the five files together and sort them by the gene identifier name so that the sequences and names for the same gene across all five genomes are located in the same row of a different column. THe goal with this is that I can then extract the sequences for all five species across every gene for an alignment. I am working with a huge amount of genes here.

Is there a way to do this using cat and sort or would python work better? I am a bit clueless as to how to do this in python.

Thanks in advance, Zach

blast • 847 views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by zgayk90
2

there a way to do this using cat and sort

yes, what have you tried ?

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Pierre Lindenbaum120k
1

What I'm not sure is if a gene is missing from one of the blast files, but present in the other four, wouldn't the genes not all line up across all five species?

I have not actually used cat and sort, but have been reading that this might work. Would you have any ideas of a possible script?

ADD REPLYlink written 3.1 years ago by zgayk90
1

This is all I have done so far and it sorted all the sequences from the same species together, so I need to figure how to modify sort to sort by gene first, and then species. Not sure how to deal with the problem of genes missing in one genome, but present in the others.

cat outputExpandedPA.blast.txt outputExpandedGS.blast.txt outputExpandedGG.blast.txt outputExpandedFG.blast.txt outputExpandedCl.blast.txt > Combined.txt | sort

ADD REPLYlink written 3.1 years ago by zgayk90
1

Is the output in one of the tabular blast output formats? If not, doing a simple cat/sort will not work.

ADD REPLYlink written 3.1 years ago by genomax68k
1

I outputted the blast results in output format 7, the one that gives the actual sequences of both subject and match.

The sort worked, butI'm just not sure how to modify it to line up the sequences for each gene so I can extract the sequences for each species and then align the sequences.

ADD REPLYlink written 3.1 years ago by zgayk90
1

Have you tried any Bio-* parsers? - http://biopython.org/DIST/docs/tutorial/Tutorial.html - http://search.cpan.org/dist/BioPerl/Bio/SearchIO/blast.pm

ADD REPLYlink written 3.1 years ago by Khader Shameer18k
1

I am familiar with biopython, but have not used it for this task. Could you recommend a particular biopython function for this?

Thanks, Zach

ADD REPLYlink written 3.1 years ago by zgayk90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1084 users visited in the last hour