Question: (Closed) print lines with specific members only
0
ahmedakhokhar • 110 wrote:
I have a dataset with following format:
a. 1 w p1,p2,w3
b. 1 w p1,p2,p3, w3
c. 1 w p1,w3
d. 1 w p1,w3
I want to print only the lines where 'p1,w3' present but not any of lines where p1, w3 are present in combination with other members like p1,p2,w3 ...
c. 1 w p1,w3
d. 1 w p1,w3
Here what I'm doing:
with open("file.txt",'rU') as lines:
for line in lines:
line = line.split('\t')
line1 = line[-1].split(',')
for gen in line1:
if 'p1' and 'w3' in gen:
print(line)
it prints all lines including unwanted lines (a and b), any tips/ideas? are welcome thanks.
Hello ahmedakhokhar!
We believe that this post does not fit the main topic of this site.
This is a pure programming question, which is better asked e.g. on stackoverflow.
For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.
If you disagree please tell us why in a reply below, we'll be happy to talk about it.
Cheers!
I am working with "Bioinformatics" data, the p1, w3 are different cell types.
This can be done easily with
grep
-like functions. As such, it's just pattern-matching/programming and not really bioinformatics-intensive. Please search StackOverflow on how to use regular expressions in python/R. You're better off usingawk
for this task, if you don't really need to use python/R.Thanks, I am new to 'awk', can you please give an example how can I match "p1'" and "w3" from a line and print? Thank you so very much.
Please google
awk match multiple patterns to column
- let us know if you have any specific questions after you've tried your best.Done, thank you