|
|
|
11-21-2014, 08:06 PM
|
#1
|
Human being with feelings
Join Date: Feb 2008
Location: Mesa, AZ
Posts: 2,057
|
Thinking about building regex-like string functions for EEL from the ground up
I created a cool string function, see details here: http://stackoverflow.com/questions/2...tring-function
and I am thinking of expanding it to have regex-like syntax. Why not exactly regex? Because I think I can do better than regex (in terms of making it more useful in a programming environment). The BIG difference I will be introducing, as detailed in the link above, is a bidirectional string matching system, where you can move forwards and backwards through the string to find matches, and the position is saved for the next series of matches. I thiiiink the "bidirectional" aspect of this removes the need for backreferencing...? So here is my idea for syntax:
Code:
START/END and other extended ASCII characters:
() = start/end of capture constructor ()
{} = start/end capture constructor medata ?(?P<name>)
^ = start/end of string ^$
>< = start/end of line ^$
· = start/end class constructor []
¦ = start/end literal constructor \
• = OR (inline) |
¬ = TO (class/inline) -
¬ = start/end do not capture (group/inline) (?:)
` = start/end do not capture (outline) (?:)
ˆ¯ = start/end NOT (inline) ^
ªª = literal ª (inline) \
ª• = literal • (inline) \
ªa = literal a (outline) \
= literal space (inline)
˜ = 0 or 1 (inline) ?
‹ = 0 or more (lazy) (inline) *?
« = 0 or more (grdy) (inline) *
› = 1 or more (lazy) (inline) +?
» = 1 or more (grdy) (inline) +
© = 1 of any character (inline) .
©° = 0 or 1 of any character (inline) .?
©‹ = 0 or more of any character (lazy) (inline) .*?
©› = 1 or more of any character (lazy) (inline) .+?
©« = 0 or more of any character (grdy) (inline) .*
©» = 1 or more of any character (grdy) (inline) .+
N TIMES SPECIFIERS (outline):
? = 0 or 1 ?
* = 0 or more (lazy) *?
** = 0 or more (grdy) *
+ = 1 or more (lazy) +?
++ = 1 or more (grdy) +
2* = 2 or more (lazy) {2,}*?
2+ = 2 or more (grdy) {2,}*
*2 = 0 to 2 matches (lazy) {,2}?
1+2 = 1 to 2 matches (grdy) {1,2}*
2 = exactly 2 matches {2}
. = 1 of any character .
~ = 0 or 1 of any character .?
: = 0 or more of any character (lazy)
; = 1 or more of any character (lazy)
:: = 0 or more of any character (grdy)
;; = 1 or more of any character (grdy)
FUZZY MATCH SPECIFIERS (outline):
@ = FULL word [a-zA-Z]+
# = FULL number ([+-]?(?:\d\.\d+|\d+|\.\d+|\d+\.))
& = FULL midi note name [A-G]#?(?:-2|-1|[0-8])
$ = FULL sentence [a-zA-Z,;'" \t]+[?!.]+[\t ]+
% = FULL path [a-zA-Z]:\\[^<>:"/|?*]*\\[^<>:"/|?*\\]*\.[^<>:"/|?*\\\s]*
_ = space
- = not space [^ ]
t = tab \t
T = not tab [^\t]
h = horizontal space [\t ]
H = not horiz space [^\t ]
v = vertical space [\r\n]
V = not vertical space [^\r\n]
w = white space \s
W = not white space \S
n = numeral \d
N = not a numeral \D
a = alphabet [a-zA-Z]
A = not alphabet [^a-zA-Z]
s = symbol [`~!@#$%^&*()-_=+\\|\]\}\[\{;:'"/?.>,<'\}\]]
S = not symbol [^`~!@#$%^&*()-_=+\\|\]\}\[\{;:'"/?.>,<'\}\]]
| = OR |
!= = start/end NOT ^
/ = literal
\ = literal
CLASS CONSTRUCTOR (inline):
·a¬c = a TO c [a-c]
·ˆa¬c = NOT a TO c [^a-c]
·abc = a OR b OR c [abc]
·ˆabc = NOT a NOT b NOT c [^abc]
·ˆa¯bc = NOT a, b OR c [^ad-zA-Z`~!@#$%^&*()-_=+\\|\]\}\[\{;:'"/?.>,<'\}\]]
LOGIC SPECIFIERS (inline):
¦ab•cd = ab OR cd ab|cd
¦ˆab¯ = NOT ab
LOGIC SPECIFIERS (outline):
dd|ad = digit digit OR alphbelt digit \d\d|[a-zA-Z]\d
!dd= = NOT digit digit
CAPTURE SPECIFIERS:
(¦¬abc) = do not capture (?:abc)
(¦¬ab¬cd¬e¬f) = match abcdef, capture only cdf
(grp}ab) = named group match ab (?P<grp>ab)
{1} = match by whetever is stored in group 1
{grp} = match by whetever is stored in group "grp" (\k<grp>)
?FLAGS? = multiple flags (?FLAGS)
?i = case insensitive (?i)
?I = insensitive off (?i) default
?c = continous (?g) default
?C = continous off
OUTPUT SYNTAX:
¦ = start/end group output
¦- = entire string $0
¦-,0 = entire string and all substrings $0$1$2...
¦0,- = all substrings and entire string $0
¦0 = all substrings $1$2$3...
¦0}-¦ = each substring separated by '-' $1,$2,$3...
¦12 = group 12
¦2,3 = group 2 and 3 $2$3
¦+2 = group 0 to 2 $1$2$3
¦2+ = group 2 and up $2$3$4...
¦1+3 = group 1 to 3 $1$2$3
¦grp = group "grp"
SPECIAL FUNCTIONS::
function.sentence_body = "characters"
function.sentence_end = "characters"
function.word = "characters"
function.alphanumeric = "characters"
funciton.number_signmode = 0 or 1 or 2 or 3
0: both + and -
1: only -
2: only +
3: no signs allowed
funciton.number_matchmode = 0 or 1
0: consume everything that matches
1: do not consume ending '.' (because it is a period)
function.number_decimalchar = '.' or ',' is acceptable
function.number_bignumbermode = 0 or 1
0: do not caputre big numbers e.g. 1,023,420
1: capture big numbers
function.number_plus = 0 or 1
0: convert + to null
1: do not convert
2: + means not a number
function.path = 0 or 1 or etc.
0: filesystem_type1
1: filesystem_type2
2: etc.
note: spaces are only literal when inline or in group metadata
input: There are 354 three numbers 50 in this 222 string.
MATCH: (,}`:`d+)3
LOGIC: Separate matches by comma in group 0, match 0 or more of any char (lazy) (no capture) follow by 1 or more of number char (lazy). Loop 3 times/
SYNTAX: ¦0
LOGIC: Output group 0.
output: 354,50,222
__________________
Soundemote - Home of the chaosfly and pretty oscilloscope.
MyReaperPlugin - Easy-to-use cross-platform C++ REAPER extension template
Last edited by Argitoth; 11-25-2014 at 12:09 PM.
|
|
|
11-22-2014, 01:00 AM
|
#2
|
Human being with feelings
Join Date: Jan 2007
Location: mcr:uk
Posts: 3,891
|
My only thought is that since regex is moderately standardised and well documented it would be easier for people to use if you stick to it. If you implement standard regex syntax then I can't see why you couldn't add a method of searching backwards too.
|
|
|
11-22-2014, 04:49 AM
|
#3
|
Human being with feelings
Join Date: Feb 2008
Location: Mesa, AZ
Posts: 2,057
|
noted! It'd be a miracle if I implemented even 1/10th of the functionality, maybe that's due to lack of confidence. Mostly I'm just writing down my ideas.
Edit: Will release a basic string function soon.
__________________
Soundemote - Home of the chaosfly and pretty oscilloscope.
MyReaperPlugin - Easy-to-use cross-platform C++ REAPER extension template
Last edited by Argitoth; 11-23-2014 at 09:53 AM.
|
|
|
11-23-2014, 12:49 PM
|
#4
|
Human being with feelings
Join Date: Feb 2008
Location: Mesa, AZ
Posts: 2,057
|
kThere! I did it! The ultimate syntax for matching strings. A little more complex than regex, but you can do more with less characters.
The big difference in this syntax is that there is a switch to go from literal to not literal. Everything outside of "literal" switch is a class or some kind of logic or group stuff. Everything inside of "literal" (is literal of course) makes use of ascii characters above 127 for additional match functions (so you don't have to switch out of literal mode constantly).
Why not regex you say? speaking to you IXix Because regex simply does not do what I need!
Another big feature I am introducing in this syntax is better group and capture implementation. You can do a lot with groups! Such as: use groups to separate strings by whatever characters , store any sort of matches in one group, don't store certain matches, named and/or numbered groups.
__________________
Soundemote - Home of the chaosfly and pretty oscilloscope.
MyReaperPlugin - Easy-to-use cross-platform C++ REAPER extension template
Last edited by Argitoth; 11-25-2014 at 12:16 PM.
|
|
|
11-23-2014, 01:17 PM
|
#5
|
Human being with feelings
Join Date: Nov 2010
Posts: 2,436
|
Quote:
Originally Posted by Argitoth
Because regex simply does not do what I need!
|
Never found a problem I couldn't solve with regex. I guess there's a reason it's so popular
I agree with IXix, reinventing the wheel here is not really user-friendly. Regex can get complicated, and there are even multiple flavors of it. Forcing the user to learn yet another version is not nice
|
|
|
11-23-2014, 01:39 PM
|
#6
|
Human being with feelings
Join Date: Feb 2008
Location: Mesa, AZ
Posts: 2,057
|
Quote:
Originally Posted by Breeder
Never found a problem I couldn't solve with regex. I guess there's a reason it's so popular
|
True true... although I think it's popular because it's the only one... is there something else?
__________________
Soundemote - Home of the chaosfly and pretty oscilloscope.
MyReaperPlugin - Easy-to-use cross-platform C++ REAPER extension template
|
|
|
06-03-2020, 05:26 PM
|
#7
|
Human being with feelings
Join Date: Apr 2011
Posts: 3,458
|
I know I am a bit late... but... have you done it?
|
|
|
Thread Tools |
|
Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -7. The time now is 02:57 AM.
|