Entering edit mode
2.7 years ago
Kevin Blighe
86k
We have a FASTA sequence that is just the header and sequence on a single line:
cat fasta.fasta
> 1
AGTACGATCTACGTACGCAACTGAGCTACTACAGTCATGCTGACACTGACTGACACTGACTGACTGTGACACTGACTGCATGCTGCTGGCCCCGCAGTATCGACTGCGTACGTCGCGCGATTACGCGTACTGCGTCTGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGTGCACGTACTGATGCACATGCACTGA
> 2
TGACAGCTACTGACGTACGTACGTACGTCAGTACGTACGTACGTCAGTACGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGCACTGCATGACTGACGTACGTACGTACGTACGT
We can use AWK to tidy this into lines of equal length, as follows:
awk -v len=40 -F "" '/^>/ {print}; !/^>/ {for (i=1; i<=NF; i++) {printf $(i); if (i % len == 0 || i == NF) printf "\n"}}' fasta.fasta
> 1
AGTACGATCTACGTACGCAACTGAGCTACTACAGTCATGC
TGACACTGACTGACACTGACTGACTGTGACACTGACTGCA
TGCTGCTGGCCCCGCAGTATCGACTGCGTACGTCGCGCGA
TTACGCGTACTGCGTCTGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCATGTGCACGTACTGATGCACATGCA
CTGA
> 2
TGACAGCTACTGACGTACGTACGTACGTCAGTACGTACGT
ACGTCAGTACGTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTAGCACTGCATGACTGACGTACGT
ACGTACGTACGT
awk -v len=10 -F "" '/^>/ {print}; !/^>/ {for (i=1; i<=NF; i++) {printf $(i); if (i % len == 0 || i == NF) printf "\n"}}' fasta.fasta
> 1
AGTACGATCT
ACGTACGCAA
CTGAGCTACT
ACAGTCATGC
TGACACTGAC
TGACACTGAC
TGACTGTGAC
ACTGACTGCA
TGCTGCTGGC
CCCGCAGTAT
CGACTGCGTA
CGTCGCGCGA
TTACGCGTAC
TGCGTCTGCA
TGCATGCATG
CATGCATGCA
TGCATGCATG
CATGCATGTG
CACGTACTGA
TGCACATGCA
CTGA
> 2
TGACAGCTAC
TGACGTACGT
ACGTACGTCA
GTACGTACGT
ACGTCAGTAC
GTTTTTTTTT
TTTTTTTTTT
TTTTTTTTTT
TTTTTTTTTT
TTTTTTTAGC
ACTGCATGAC
TGACGTACGT
ACGTACGTAC
GT
AWK doesn't have to be a one-liner, either:
awk -v len=80 -F "" '/^>/ {print};
!/^>/ {
for (i=1; i<=NF; i++) {
printf $(i);
if (i % len == 0 || i == NF)
printf "\n"
}
}' fasta.fasta
> 1
AGTACGATCTACGTACGCAACTGAGCTACTACAGTCATGCTGACACTGACTGACACTGACTGACTGTGACACTGACTGCA
TGCTGCTGGCCCCGCAGTATCGACTGCGTACGTCGCGCGATTACGCGTACTGCGTCTGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCATGTGCACGTACTGATGCACATGCACTGA
> 2
TGACAGCTACTGACGTACGTACGTACGTCAGTACGTACGTACGTCAGTACGTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTTTTTTTTTTTTTTAGCACTGCATGACTGACGTACGTACGTACGTACGT
Kevin
To linearize fasta use @Pierre's code so then you can use @Kevin's code