sort FASTA file into table
1
0
Entering edit mode
6.2 years ago

My fasta file looks like this

>seq0
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>seq1
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>seq2
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>seq3
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
>seq4
EEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVVSYEMRLFGVQKDNFALEHSLL
>seq5
SWEEFAKAAEVLYLEDPMKCRMCTKYRHVDHKLVVKLTDNHTVLKYVTDMAQDVKKIEKLTTLLMR
>seq6
FTNWEEFAKAAERLHSANPEKCRFVTKYNHTKGELVLKLTDDVVCLQYSTNQLQDVKKLEKLSSTLLRSI
>seq7
SWEEFVERSVQLFRGDPNATRYVMKYRHCEGKLVLKVTDDRECLKFKTDQAQDAKKMEKLNNIFF
>seq8
SWDEFVDRSVQLFRADPESTRYVMKYRHCDGKLVLKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>seq9
KNWEDFEIAAENMYMANPQNCRYTMKYVHSKGHILLKMSDNVKCVQYRAENMPDLKK
>seq10
FDSWDEFVSKSVELFRNHPDTTRYVVKYRHCEGKLVLKVTDNHECLKFKTDQAQDAKKMEK

how do I make it look like this (table) : .................................................................................................................................

Seq name    amino acid    occurance......................................
seq0               F                12% .......
                       A                     26% .
                       K                     60%

Seq1                T                      70% .
                        L                     50% .
                          W                   12% , etc
FASTA bash • 1.7k views
ADD COMMENT
0
Entering edit mode

what have you tried ?

ADD REPLY
0
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

Hello, Thanks for that!

So far I have this script

esearch_outfile=esearch.txt # file for esearch results

unaligned=seqs.fa # file for sequences downloaded with efectch

aligned=seqs_ALIGNED.fa # file for sequences aligned by Clustal-Omega

echo RUNNING WGET FOR ESEARCH

wget "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=((txid6231[Organism:exp]) AND COX1[Gene Name]) NOT partial&usehistory=y&retmode=json" -O $outfile

webenv_line=`grep webenv ${esearch_outfile}`

webenv=`echo $webenv_line | cut -f 4 -d '"'`

echo RUNNING WGET FOR EFETCH

wget "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?WebEnv=${webenv}&query_key=1&db=protein&rettype=fasta" -O $unaligned

echo RUNNING CLUSTALO

clustalo -i $unaligned > $aligned

echo LAUNCHING CLUSTALX2

clustalx2 $aligned &

echo LAUNCHING PLOTCON

plotcon -sequences $aligned -winsize 100 -graph x11 &

echo DONE

I have no idea how to make it into a table though. the

-outfmt 7

comand doesnt work...

ADD REPLY
0
Entering edit mode

AS far as I understand, this is the script you used to get the data. How have you tried to answer the current question ? "sort (sort ?) FASTA file into table"

ADD REPLY
0
Entering edit mode
#start sorting the file from here

grep -c "^>" file.fa #counts proteins in fasta file
echo ls |wc -l
echo 'NUMBER OF PROTEINS' 

sed 's/>.*/&protein/' file.fa > outfile.fa # adds word protein to end of all headers

awk '{print $2}' file.fa > output.fa #only second column of the header is outputted

sed -e '/^>/s/$/@/' -e 's/^>/#/' file.fasta | tr -d '\n' | tr "#" "\n" | tr "@" "\t" | sort -u -t $'\t' -f -k 2,2  | sed -e 's/^/>/' -e 's/\t/\n/' #removes all duplicated sequences

while read line;
    do if [ "${line:0:1}" == ">" ]; 
    then echo -e "\n"$line; else echo $line | tr -d '\n' ; fi; 
done < input.fasta > output.fasta

#Once linearized, to pick the sequence for the id 'x' you can use grep -A1 'x' output.fasta

-outfmt 7 gaps | pident |score #make table with column headings

column -t seq_ALIGNED.fa | less -S

grep -w "^++1" * | cut -f7 | awk '{sum += $1} END {print sum}'
grep -w "^++1" * | cut -f8 | awk '{sum += $1} END {print sum}'
grep -w "^++1" * | cut -f15 | awk '{sum += $1} END {print sum}'


awk 'BEGIN{RS=">"}NR>1{sub("\n","\t"); 
gsub("\n",""); 
print RS$0}' file #make table

sort -k2b,2 -k1,1 <sorted.txt #sorts descending order

This is the lis tof things I have tried so far

ADD REPLY
0
Entering edit mode
6.2 years ago

assuming two lines per sequence record:, using awk:

awk '/^>/ {print; next;} {delete H; L=length($0);for(i=1;i<=L;i++) H[substr($0,i,1)]++; for(x in H) print x,int(H[x]/L*100.0);}' input.fasta
ADD COMMENT

Login before adding your answer.

Traffic: 1189 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6