trimming of fasta file
1
0
Entering edit mode
3.3 years ago
harry ▴ 30

As I have multiple fasta sequences like see below:

>read1
CAGGTCTGGCTGGATGAAGGGCACGGCATAGGTCTGACCTGCCAGGGAGTGCTGCATCCTCACAGGAGTCATGGTGCTGCTGAAGATGTCTCCAGAGACCTTCTGCAGGTACTGCAGGGCATCCGCCATCTGCTGGACGGCCTCCTCTCGCCG
>read2
CATCTGCGGAGGCTGCCGTGACGTAGGGTATGGGCCTAAATAGGCCATTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCCGGGAGGTAGGAGTGGTGAGGTCGACTTGAGAAAGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATTGTAACGATGGAGCTGTGCCTGTGGAGGCTGTTGTGAGGCAGTAGCCT
>read3
TTGAGGTGGGAGGATCGCTTCAGCCTGGAAGGTTGAGGCTGCAGTCAGCTGCGATAGCACTACTACACTCCAGCCTTGGACAACAGAGGGAGACCTTTCGCTGTCACCCCTCTAGAATCCACGTATACGAAAATTCCAAATGTTAGTTGGGCATAGTGGCAAGCACCTGTAGTCTCAGCCACGTGGGAGG

These are in different lengths so I want to isolate the middle sequence of all the fasta_sequence_reads. It is better if all are 150-160bp in sequences. Is there is a way to do this? Thanks in advance. for example, I have 1 read like below which contain 247 nucleotides: read2:

CATCTGCGGAGGCTGCCGTGACGTAGGGTATGGGCCTAAATAGGCCATTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCCGGGAGGTAGGAGTGGTGAGGTCGACTTGAGAAAGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATTGTAACGATGGAGCTGTGCCTGTGGAGGCTGTTGTGAGGCAGTAGCCT

so after trimming from both sides the middle part is remaining 153bp:

TTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCCGGGAGGTAGGAGTGGTGAGGTCGACTTGAGAAAGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATT

Like this, I want to do for my whole fasta sequence by using one command. So please can you tell me how to get a middle sequence of fasta files. thanks in advance.

fasta trimming • 3.0k views
ADD COMMENT
1
Entering edit mode

how is it different from your previous question ? Script for making exon file

ADD REPLY
0
Entering edit mode

In previous questions, I asked about how to cut a sequence and rejoin. means cut in 2 equal parts and join the downstream region to the upstream region. in this question I want to isolate the sequence from the middle part which overhangs from both sides of the middle part. Thanks

ADD REPLY
0
Entering edit mode

. in this question I want to isolate the sequence from the middle part which overhangs from both sides of the middle part.

it's the very same kind of operation.

ADD REPLY
0
Entering edit mode

Can you please tell me how is it I doing this because I want a 150 bp sequence from the middle? 75bp upstream and 75bp downstream from the middle. Thanks

ADD REPLY
1
Entering edit mode

This is essentially alignment with trimming. You can do this with a GUI application like ugene, or geneious or bioedit. You can also see this answer: Most efficient way to trim overhanging bases after alignment

ADD REPLY
0
Entering edit mode

God kills a kitten every time you use a gui.

ADD REPLY
0
Entering edit mode

I disagree! Sometimes it's easier to just use a gui and get the job done. This is a good example where such usage is perfect. I do this regularly for trimming large MSA aligned to a reference.

ADD REPLY
0
Entering edit mode

The problem with GUI applications is that they are often not easily reproducible and scalable in workflows. It's better in the long term to use CLI unless absolutely necessary.

ADD REPLY
0
Entering edit mode
3.3 years ago
$ awk -v OFS="\n" '/^>/ {getline seq}{print $0,substr(seq,length(seq)/2-75,150)}' test.fa 

>read1
CAGGTCTGGCTGGATGAAGGGCACGGCATAGGTCTGACCTGCCAGGGAGTGCTGCATCCTCACAGGAGTCATGGTGCTGCTGAAGATGTCTCCAGAGACCTTCTGCAGGTACTGCAGGGCATCCGCCATCTGCTGGACGGCCTCCTCTCG
>read2
TTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCCGGGAGGTAGGAGTGGTGAGGTCGACTTGAGAAAGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCC
>read3
TCAGCCTGGAAGGTTGAGGCTGCAGTCAGCTGCGATAGCACTACTACACTCCAGCCTTGGACAACAGAGGGAGACCTTTCGCTGTCACCCCTCTAGAATCCACGTATACGAAAATTCCAAATGTTAGTTGGGCATAGTGGCAAGCACCTG
ADD COMMENT

Login before adding your answer.

Traffic: 1968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6