How to extract a particular region from a nucleotide contig?
3
0
Entering edit mode
4.4 years ago

Hi,

I have downloaded a contig from NCBI. It is around 100 kbp long and has a integrated prophage region. The prophage region is between 54501-90604 bp. How to extract only 54501-90604 bp from this contig?

Cheers

sequence assembly • 1.2k views
ADD COMMENT
2
Entering edit mode
4.4 years ago

use samtools faidx to index and retrieve those coordinates

ADD COMMENT
2
Entering edit mode
4.4 years ago
grep -v -E '^>|^$' input.fasta | tr -d '\n' | cut -c 54501-90604 | fold -w 60
ADD COMMENT
1
Entering edit mode
4.4 years ago
Joe 21k

The easiest (for me) would be:

from Bio import SeqIO

rec = SeqIO.read('/path/to/file', 'format')  # assuming only one sequence in the file
prophage = rec[54501:90604]

# write sequence out or do whatever analysis next.

You will need to double check the coordinates. Since python starts at 0, I think you may need to subtract 1 from each index.

ADD COMMENT

Login before adding your answer.

Traffic: 2934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6