I am trying to do two things, I will try to make this as clear as possible.
1. I have aligned and downloaded about 500 sequences in BLAST. However, in my FASTA file I just want to show the accession number, not the GI number.
So convert this:
Is there a way to do this? I could write a script in Python, but I don't want to re-invent the wheel.
2. From my alignment, I generated a sequence similarity matrix in a software called MacVector. This assigns a similarity score to all the sequences on the basis of how similar they are. I then plotted this in excel, in the form of a histogram.
It looks like this:
Each bar in the histogram is supposed to be a single sequence, and the x axis the accession number or identifier for that sequence. As you can probably tell, the x-axis is missing a lot of labels (it needs to be 500 labels).
I had this problem before, displaying relatively large data sets cleanly in R, usually I just edited the picture, but for 500 sequences, it is jus too much. I am sure someone has run into this before. Is there a way to clean this up?
Any advice or pointing me in the right direction would be thoroughly appreciated.