Extracting genomic regions based on coordinates in a text file
1
0
Entering edit mode
3.6 years ago

Hey everyone,

I have identified several regions of interest, using genbank files as input. This tool only outputs the start and stop position of these regions, but doesn't provide the option to extract the regions. Could you please let me know how to do this?

I have a output text file like this:

region organism contig start stop genes

NC_002516.2_17  GCF_000006765.1_ASM676v1_genomic.gbk    NC_002516.2_Pa  230543  237111  6
NC_002516.2_0   GCF_000006765.1_ASM676v1_genomic.gbk    NC_002516.2_Pa  675861  703058  32
NC_002516.2_4   GCF_000006765.1_ASM676v1_genomic.gbk    NC_002516.2_Pa  786074  797598  16
NC_002516.2_14  GCF_000006765.1_ASM676v1_genomic.gbk    NC_002516.2_Pa  895824  901046  7
...

I would like to extract these regions into single or multi-fasta file.

Thanks in advance for taking the time!

sequence • 697 views
ADD COMMENT
1
Entering edit mode

You can take columns 3, 4, and 5 and use BEDTools (bedtools getfasta) to extract the desired regions. Just make sure the naming of the contigs are consistent with the reference genome you're extracting from.

ADD REPLY
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 1510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6