Question: How to match location of string
1
gravatar for archana.bioinfo87
4 months ago by
archana.bioinfo87170 wrote:

Hi,

I am trying to match the location of the list of string like this

library(seqinr)

at <- ("ATATATAT")
s1 <-ifelse(at[8]=="T"||"A" && at[7]=="A"||"T" &&
            at[6]=="T"||"A",5,
            ifelse(at[2]=="T"||"A" && at[4]=="A"||"T" &&
                     at[1]=="T"||"A",'1','0'
            ))
s1

It works fine only for one sequence. I tried it in a for loop but getting error like

invalid 'x' type in 'x && y'

Any help is much appreciated Thanks

seqinr R • 245 views
ADD COMMENTlink modified 4 months ago by zx87548.2k • written 4 months ago by archana.bioinfo87170

This is a Question, not a Page, please be careful when selecting the post type.

What is s2c()? From which package?

If the code above works but the loop doesn't, you should show the loop as well, and provide an example dataset to replicate the failure.

ADD REPLYlink written 4 months ago by h.mon27k
1

This looks like code directly translated from Excel functions. Surely there must be better, more efficient ways to achieve OP's goals.

ADD REPLYlink modified 4 months ago • written 4 months ago by RamRS24k
1

I think this is the s2c function OP is using. Also, how is a[8] == "T"||"A" even proper R syntax? the "T" || "A" will throw an error. Pretty sure OP's code doesn't work as-is at the moment.

ADD REPLYlink modified 4 months ago • written 4 months ago by RamRS24k

Can you describe what you're trying to achieve and what a actually looks like (i.e., the result of s2c(at)).

EDIT: and what the final for-loop is supposed to achieve.

I promise, if you describe your question properly (i.e. what exactly should be the end result?) there's going to be a more robust way of doing that in R.

ADD REPLYlink modified 4 months ago • written 4 months ago by Friederike5.1k

that "a" I was using for next coding step; not the part of this analysis.

ADD REPLYlink written 4 months ago by archana.bioinfo87170

Thanks, everyone for reply.

Let me correct my question to make it easy to understand

I have a list of sequences like this in 2nd column of a csv file.

        Seq
>1_seq     ACGTATTGATGCCACAGACGTATTGATGCCACAGACGTATTGATGCCACAG
>2_seq     ACGTATTGATGCCACAGACGTATTGATGCCACAGACGTATTGATGCCACCC
>3_seql    ACGTATTGATGCCACAGACGTATTGATGCCACAGACGTATTGATGCCACTT
>4_seql    ACGTATTGATGCCACAGACGTATTGATGCCACAGACGTATTGATGCCACAG

I want to match the position of each sequences w.r.t. each other For example, if A or T is present in "11th or 17" location of each sequence then return 1 else 0.

Thanks in advance

ADD REPLYlink modified 4 months ago by genomax71k • written 4 months ago by archana.bioinfo87170
1

That seems to be partially from multiple sequence alignment, Either way, you might benefit from creating a 2D matrix with each column a base position and each row a sequence, that would be a lot easier to filter using indexes.

ADD REPLYlink written 4 months ago by RamRS24k

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

Ideally edit your original question and add this information there.

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax71k

For example, if A or T is present in "11th or 17" location of each sequence then return 1 else 0.

That doesn't make sense to me.

ADD REPLYlink written 4 months ago by WouterDeCoster41k

Do you only care about the presence or absence in certain positions? How many positions are you interested in?

ADD REPLYlink written 4 months ago by Joe14k
4
gravatar for zx8754
4 months ago by
zx87548.2k
London
zx87548.2k wrote:

To simplify your example, condition: if any 2nd or 4th position in every sequence has A or T, then TRUE.

# example data
x <- c("AAGTA", 
       "AAGTA", 
       "AAGTA", 
       "ACGAA")

# in this example all TRUE
all(substr(x, 2, 2) %in% c("A", "T") | substr(x, 4, 4) %in% c("A", "T"))
# [1] TRUE

If this is not the solution you are looking for, then please provide example input and expected output, clearly.

ADD COMMENTlink written 4 months ago by zx87548.2k

excellent.....

Thanks alot dear. It's working...

ADD REPLYlink modified 4 months ago • written 4 months ago by archana.bioinfo87170
2
gravatar for Friederike
4 months ago by
Friederike5.1k
United States
Friederike5.1k wrote:

To address your error message:

assuming this is R code, I don't think that the command works even in a single instance outside a for-loop:

> "A" == "T"||"A" && "A" == "A"||"T"
Error in "A" && "A" == "A" : invalid 'x' type in 'x && y'

The syntax would have to be:

> "A" %in% c("T","A") && "A" %in% c("T","A")
[1] TRUE

That being said, as the numerous comments above indicate, there's most definitely a more straight-forward way of doing whatever it is you're trying to do.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Friederike5.1k

The following regex would test the same things:

ifelse(grepl(".{5}[A|T]{3}", at), 
          5, 
             ifelse(grepl("[A|T]{2}.[A|T]", at), 
                      1,
                      NA
))

Note how you're also missing the indication for what should happen if the second ifelse iteration returns a FALSE (I've used NA here)

ADD REPLYlink written 4 months ago by Friederike5.1k

Thanks for the quick reply. But still, I am getting the same error for big files.

ADD REPLYlink written 4 months ago by archana.bioinfo87170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1938 users visited in the last hour