Question: Need help : cut the fasta sequences from a specific region
2
gravatar for Varshney
2.9 years ago by
Varshney20
Varshney20 wrote:

Hello everyone,

I have a genome assembly file in the fasta format. I have to trim that sequences based on specific positions from that file.

How can i do this by Perl or shell script ?

I have almost 2000 sequences in my fasta file and I have the required positions in a tab delimited file containing id, start and end.

It will be great if anyone could help me on this.

Thanks in Advance !!

assembly • 3.2k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Varshney20

Hey Varshney, could you post a small example of your data and required output...

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by James Ashmore2.9k

Thank you for the answers, but how can i remove the seqs based on their positions from fasta file.

ADD REPLYlink written 2.9 years ago by Varshney20

Again, can you please provide some sample data and output. Do you want the sequence to be cut out completely and the two leftover ends joined together, or do you want it to be masked in someway?

ADD REPLYlink written 2.9 years ago by James Ashmore2.9k

Varshney : Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by genomax83k

Hey James,

I have multi sequences in one fasta file and another tab delimited file which containing the ids, start, end. Like this:

Seq ID Start End

jcf713497 1 374

jcf713573 1 2268

jcf7123620 17 474

jcf7123620 5675 5707

jcf7123757 1 507

So, how can I remove these positioned sequences from fasta file ?

ADD REPLYlink written 2.9 years ago by Varshney20

Do you want the sequence to be cut out completely and the two leftover ends joined together, or do you want it to be masked in someway? Also, when you reply, don't make a new post just click on the add comment box below my response.

ADD REPLYlink written 2.9 years ago by James Ashmore2.9k

I want the sequence to be cut out completely and the two leftover ends joined together.

ADD REPLYlink written 2.9 years ago by Varshney20
3

This answer requires you to be working on a unix machine and have the bedtools and sed command installed:

# Mask your regions with a zero character
bedtools maskfasta -mc 0 -fi input.fasta -bed regions.bed -fo masked.fasta

# Replace the masked regions with no characters
sed -i 's/0//g' masked.fasta > result.fasta
ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by James Ashmore2.9k
4
gravatar for trausch
2.9 years ago by
trausch1.5k
Germany
trausch1.5k wrote:

You can use samtools to extract regions from a FASTA sequence

samtools faidx hg19.fa chr1:20000-20100

ADD COMMENTlink written 2.9 years ago by trausch1.5k
1
gravatar for Chadi Saad
2.9 years ago by
Chadi Saad70
France
Chadi Saad70 wrote:

You can use bedtools getfasta :

bedtools getfasta -fi input_file.fa -bed regions_file.bed

ADD COMMENTlink written 2.9 years ago by Chadi Saad70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1441 users visited in the last hour