Question: trimming of fasta file
0
gravatar for harry
6 weeks ago by
harry10
harry10 wrote:

As I have multiple fasta sequences like see below:

>read1
CAGGTCTGGCTGGATGAAGGGCACGGCATAGGTCTGACCTGCCAGGGAGTGCTGCATCCTCACAGGAGTCATGGTGCTGCTGAAGATGTCTCCAGAGACCTTCTGCAGGTACTGCAGGGCATCCGCCATCTGCTGGACGGCCTCCTCTCGCCG
>read2
CATCTGCGGAGGCTGCCGTGACGTAGGGTATGGGCCTAAATAGGCCATTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCCGGGAGGTAGGAGTGGTGAGGTCGACTTGAGAAAGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATTGTAACGATGGAGCTGTGCCTGTGGAGGCTGTTGTGAGGCAGTAGCCT
>read3
TTGAGGTGGGAGGATCGCTTCAGCCTGGAAGGTTGAGGCTGCAGTCAGCTGCGATAGCACTACTACACTCCAGCCTTGGACAACAGAGGGAGACCTTTCGCTGTCACCCCTCTAGAATCCACGTATACGAAAATTCCAAATGTTAGTTGGGCATAGTGGCAAGCACCTGTAGTCTCAGCCACGTGGGAGG

These are in different lengths so I want to isolate the middle sequence of all the fasta_sequence_reads. It is better if all are 150-160bp in sequences. Is there is a way to do this? Thanks in advance. for example, I have 1 read like below which contain 247 nucleotides: read2:

CATCTGCGGAGGCTGCCGTGACGTAGGGTATGGGCCTAAATAGGCCATTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCCGGGAGGTAGGAGTGGTGAGGTCGACTTGAGAAAGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATTGTAACGATGGAGCTGTGCCTGTGGAGGCTGTTGTGAGGCAGTAGCCT

so after trimming from both sides the middle part is remaining 153bp:

TTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCCGGGAGGTAGGAGTGGTGAGGTCGACTTGAGAAAGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATT

Like this, I want to do for my whole fasta sequence by using one command. So please can you tell me how to get a middle sequence of fasta files. thanks in advance.

trimming fasta • 132 views
ADD COMMENTlink modified 6 weeks ago by Pierre Lindenbaum134k • written 6 weeks ago by harry10
1

how is it different from your previous question ? Script for making exon file

ADD REPLYlink written 6 weeks ago by Pierre Lindenbaum134k

In previous questions, I asked about how to cut a sequence and rejoin. means cut in 2 equal parts and join the downstream region to the upstream region. in this question I want to isolate the sequence from the middle part which overhangs from both sides of the middle part. Thanks

ADD REPLYlink written 5 weeks ago by harry10

. in this question I want to isolate the sequence from the middle part which overhangs from both sides of the middle part.

it's the very same kind of operation.

ADD REPLYlink written 5 weeks ago by Pierre Lindenbaum134k

Can you please tell me how is it I doing this because I want a 150 bp sequence from the middle? 75bp upstream and 75bp downstream from the middle. Thanks

ADD REPLYlink written 5 weeks ago by harry10
1

This is essentially alignment with trimming. You can do this with a GUI application like ugene, or geneious or bioedit. You can also see this answer: Most efficient way to trim overhanging bases after alignment

ADD REPLYlink written 6 weeks ago by Mark890

God kills a kitten every time you use a gui.

ADD REPLYlink written 5 weeks ago by Pierre Lindenbaum134k

I disagree! Sometimes it's easier to just use a gui and get the job done. This is a good example where such usage is perfect. I do this regularly for trimming large MSA aligned to a reference.

ADD REPLYlink written 5 weeks ago by Mark890

The problem with GUI applications is that they are often not easily reproducible and scalable in workflows. It's better in the long term to use CLI unless absolutely necessary.

ADD REPLYlink written 5 weeks ago by rpolicastro3.9k
0
gravatar for cpad0112
5 weeks ago by
cpad011215k
Hyderabad India
cpad011215k wrote:
$ awk -v OFS="\n" '/^>/ {getline seq}{print $0,substr(seq,length(seq)/2-75,150)}' test.fa 

>read1
CAGGTCTGGCTGGATGAAGGGCACGGCATAGGTCTGACCTGCCAGGGAGTGCTGCATCCTCACAGGAGTCATGGTGCTGCTGAAGATGTCTCCAGAGACCTTCTGCAGGTACTGCAGGGCATCCGCCATCTGCTGGACGGCCTCCTCTCG
>read2
TTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCCGGGAGGTAGGAGTGGTGAGGTCGACTTGAGAAAGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCC
>read3
TCAGCCTGGAAGGTTGAGGCTGCAGTCAGCTGCGATAGCACTACTACACTCCAGCCTTGGACAACAGAGGGAGACCTTTCGCTGTCACCCCTCTAGAATCCACGTATACGAAAATTCCAAATGTTAGTTGGGCATAGTGGCAAGCACCTG
ADD COMMENTlink written 5 weeks ago by cpad011215k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 998 users visited in the last hour
_