Question: How to extract a particular region from a nucleotide contig?
0
gravatar for saadleeshehreen
25 days ago by
saadleeshehreen70 wrote:

Hi,

I have downloaded a contig from NCBI. It is around 100 kbp long and has a integrated prophage region. The prophage region is between 54501-90604 bp. How to extract only 54501-90604 bp from this contig?

Cheers

sequence assembly • 100 views
ADD COMMENTlink modified 25 days ago by Pierre Lindenbaum124k • written 25 days ago by saadleeshehreen70
2
gravatar for WouterDeCoster
25 days ago by
Belgium
WouterDeCoster42k wrote:

use samtools faidx to index and retrieve those coordinates

ADD COMMENTlink written 25 days ago by WouterDeCoster42k
2
gravatar for Pierre Lindenbaum
25 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:
grep -v -E '^>|^$' input.fasta | tr -d '\n' | cut -c 54501-90604 | fold -w 60
ADD COMMENTlink written 25 days ago by Pierre Lindenbaum124k
1
gravatar for Joe
25 days ago by
Joe15k
United Kingdom
Joe15k wrote:

The easiest (for me) would be:

from Bio import SeqIO

rec = SeqIO.read('/path/to/file', 'format')  # assuming only one sequence in the file
prophage = rec[54501:90604]

# write sequence out or do whatever analysis next.

You will need to double check the coordinates. Since python starts at 0, I think you may need to subtract 1 from each index.

ADD COMMENTlink written 25 days ago by Joe15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 995 users visited in the last hour