Question

Nucleotide character position in sequence extract

0

Entering edit mode

6.9 years ago

leo1985.arnab ▴ 50

I have a text file with ~ 60,000 sequences ( 1 sequence per line) I am trying to extract all the sequences that begin with an A (first nucleotide position in the sequence ) and in the same sequence there should a T at the 10th position. For example:

Let's say these are 3 sequences in the 60,000 sequence file:

AAGGGCAGCTAATCGCCAGTG
CGGGATCTATAAGGTTGGT
AAGGGCAGCGAATCGCCAGTGAGGCT

If the search was done for the 3 sequences- my desired output needs to be only the first one.

I have tried some approaches with grep , but it has not worked out. Any help or suggestion on this matter will be greatly appreciated.

Thanks and regards !

sequence • 1.6k views

ADD COMMENT • link updated 6.9 years ago by Pierre Lindenbaum 161k • written 6.9 years ago by leo1985.arnab ▴ 50

score 2 · Accepted Answer · 2017-05-10

2

Entering edit mode

6.9 years ago

Pierre Lindenbaum 161k

) I am trying to extract all the sequences that begin with an A (first nucleotide position in the sequence ) and in the same sequence there should a T at the 10th position. For example:

 grep -E '^A.{8}T' -m1

ADD COMMENT • link 6.9 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Wonderful ! Thanks ! That worked out great. Just needed to remove the -m1 at the end as I wanted to search through the whole file.

ADD REPLY • link 6.9 years ago by leo1985.arnab ▴ 50