Question: Extract subsequences from a fastA file with specific start sequence and length in R
1
gravatar for shelley.w.peterson
9 months ago by
shelley.w.peterson10 wrote:

I have a fasta file of sequences and I am trying to trim them all to a specific length starting with a specific pattern

>seq1
ACTGCTAGCCCAGTCTGACTGACTGACTGTGTCATG
>seq2
ATCTGATGTGTGCCCCAGTGACTGACTGATGGGCCC
>seq3
CTGATGCCCAGTCGAGCTAGCATTGCCCAAATTGGCCATGCTGATGCTG
>seq4
CTAGCTAGCTAGCTGCTAGCTAGCTAGCTAGCATGCCCAGTCATGCCC

Pattern: CCCAGT

I want to write a script that will take a subsequence from each DNA sequence starting with the pattern, length 20bp (this is an example. The actual sequences/subsequences are much longer). Also the pattern must start between 5 and 20 bp from the start (seq4 in this example would be rejected).

I've been reading the Biostrings and Seqinr documentation and I can't really figure out how to do this.

So far, I would probably run it in steps like this:
1. find location of sequence on string (getLocation does this but it says only with subsequences from an ACNUC server, and I don't know how to do this with a subsequence I provide, and on every sequence from my fasta file)
2. if sequences is within expected start point parameters, extract sequence starting at start location to start location and end at start location + 20 (getFrag maybe? - but would this do it on every sequence?)

It's a bit more complicated than this, but right now I'm just trying to get some basics to start from in hopes that I can figure out the rest on my own. Thanks in advance!

dna R gene • 375 views
ADD COMMENTlink modified 9 months ago by swbarnes29.2k • written 9 months ago by shelley.w.peterson10

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLYlink written 9 months ago by genomax92k
1
gravatar for swbarnes2
9 months ago by
swbarnes29.2k
United States
swbarnes29.2k wrote:

Sounds like you just need a regex.

something like /^.{1,14}CCCAGT(.{20})/

ADD COMMENTlink modified 9 months ago • written 9 months ago by swbarnes29.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1025 users visited in the last hour