regular expression documentation

i've searched tracker, and ITN, but sadly, can't find the answer to my question. what is the regular expression syntax that IDEA (3.0.4) uses? it looked like it uses the ORO package, but i can't find doc for the syntax, only the package.

specifically, i'm trying to find all occurrences of foo inside quotes. i have tried many variations on

"["]*foo["]*"

but have made no progress.

22 comments

Hi Danny,

Try searching for:

\"foo\"

To find 'foo' anywhere between quotes, try

\".foo.\"

-Dave

"Danny van der Rijn" <dannyv@tibco.com> wrote in message
news:4648971.1052934860558.JavaMail.jrun@is.intellij.net...

i've searched tracker, and ITN, but sadly, can't find the answer to my

question. what is the regular expression syntax that IDEA (3.0.4) uses? it
looked like it uses the ORO package, but i can't find doc for the syntax,
only the package.
>

specifically, i'm trying to find all occurrences of foo inside

quotes. i have tried many variations on
>

"["]*foo["]*"

>

but have made no progress.



0

the problem with your first suggestion, \"foo\" is that it misses "foobar". the problem with your second suggestion \".foo.\" is that it catches

a = "b" + foo + "c"

when it shouldn't. still trying to figure out how to say "a quote, followed by any number of anything that isn't a quote, followed by foo followed by any number of anything that isn't a quote, followed by a quote.

0

hmm. i guess my expression would catch the same thing that your second one does, too. maybe what i'm looking for is a new feature in the regexp search "find in string."

0

OK, then, how about this unreadable mess:

["\n](\"[^"]?foo["]*?\"["\n]*)+

This finds
- one line
- which contains zero or more non-quote characters from the beginning of the
line...
- to one or more occurrences of a quoted string plus non-quoted string junk
following:
-- quoted string begins with a single quote character
-- continues with zero or more (minimal matching) non-quote characters
-- contains 'foo'
-- continues with zero or more (minimal matching) non-quote characters
-- ends with a single quote character
-- followed by any number of non-quote, non-newline characters

If you test it on

// foo
// " test for foo " this foo does not count " but this foo does"
// this foo does not count " but this foo does"
// " should" not foo find "this"

It will find the second and third lines, but not the first or fourth, the
fourth case being the interesting one where foo is not found inside a quoted
string.

-Dave

"Danny van der Rijn" <dannyv@tibco.com> wrote in message
news:15212479.1052936897288.JavaMail.jrun@is.intellij.net...

the problem with your first suggestion, \"foo\" is that it misses

"foobar". the problem with your second suggestion \".foo.\" is that it
catches
>

a = "b" + foo + "c"

>

when it shouldn't. still trying to figure out how to say "a quote,

followed by any number of anything that isn't a quote, followed by foo
followed by any number of anything that isn't a quote, followed by a quote.


0

i test it, and it finds nothing. :(

0

Hmm. It works under 3.0.2. But under 3.0.4 I get a "Bad Pattern"
information dialog. It does not seem to like the caret ^ at the beginning
of the pattern. Nor does it like the caret in the negated character class
construct . Is this what you're seeing?


"Danny van der Rijn" <dannyv@tibco.com> wrote in message
news:8060090.1052949972873.JavaMail.jrun@is.intellij.net...

i test it, and it finds nothing. :(



0

no, for me it just gives me the "nothing found" dialog.

0

I submitted a bug report, to at least try to figure out why the regex that
worked for me in 3.0.2 doesn't work in 3.0.4.

Meanwhile, since you seem to be able to use more regex expressions than I
am, try the following building blocks:

Does
\"foo\"
locate lines containing a quoted string containing only foo?
E.g.
// "foo" would be found, but
// "xxfooxx" would not.
// nor would foo be found.


Does
\"["]*?foo["]*?\"
locate lines with a quoted string containing foo and any number of leading
or trailing characters inside the quoted string?
First two comment lines above should be found with this pattern.

If so, does
(\"["]*?foo["]?\"[^"\n])+
locate lines with a quoted string containing foo followed by zero or more
trailing characters?


For reference, the original pattern that works for me is:
["\n](\"[^"]?foo["]*?\"["\n]*)+




"Danny van der Rijn" <dannyv@tibco.com> wrote in message
news:18847096.1053015650291.JavaMail.jrun@is.intellij.net...

no, for me it just gives me the "nothing found" dialog.



0

first off, i have to leave off the backslashes. so
"foo" finds "foo" no hard regexp work there. after that, i add in the [^"] part, and it fails.
"["]*foo["]*"
finds nothing as does
\"["]*?foo["]*?\"
as does
["]* and ["]*? and then i give up...

0

btw, can you point me to any doc of the regexp flavor itself?

0

That is indeed odd. Your results are what I would expect to see when
searching for a string literal; are you sure you've selected the regular
expression radio button option on the IDEA Search...Find dialog?

Try searching for "[a]" which should locate the next letter 'a' anywhere
after your cursor.

I like to use "Perl in a Nutshell" published by O'Reilly (chapter 4, pp
63-70). But there are many places to find regex documentation.
Unfortunately there are several dialects, and some documentation is geared
toward programmatic use of regex compilers, as opposed to user
documentation. Searching for "Regular Expression Syntax" on Google yielded
many hits -- such as

http://www.python.org/doc/current/lib/re-syntax.html
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/jscript7/html/jsjsgrpregexpsyntax.asp


"Danny van der Rijn" <dannyv@tibco.com> wrote in message
news:20811616.1053032071254.JavaMail.jrun@is.intellij.net...

first off, i have to leave off the backslashes. so
"foo" finds "foo" no hard regexp work there. after that, i add in the

[^"] part, and it fails.

"["]*foo["]*"
finds nothing as does
\"["]*?foo["]*?\"
as does
["]* and ["]*? and then i give up...



0

something interesting to report. i was doing my searches in the "find in path" dialog, because i wanted them anywhere. the "find" dialog behaves differently. telling me when it doesn't like my patterns, for instance. so now the trick is to find out what version of [^"] doesn't yield a "bad pattern" dialog. haven't figured it out yet.

as for the google pointers, the problem is that, as you point out, there are many different flavors. i am using what i believe to be PERL regexp syntax, and it's not working.

0

We are using oromatcher, please check out their docs.

--

Eugene Belyaev, CTO
JetBrains, Inc
http://www.intellij.com
"Develop with pleasure!"


"Danny van der Rijn" <dannyv@tibco.com> wrote in message
news:24345816.1053038862522.JavaMail.jrun@is.intellij.net...

something interesting to report. i was doing my searches in the "find in

path" dialog, because i wanted them anywhere. the "find" dialog behaves
differently. telling me when it doesn't like my patterns, for instance. so
now the trick is to find out what version of [^"] doesn't yield a "bad
pattern" dialog. haven't figured it out yet.
>

as for the google pointers, the problem is that, as you point out, there

are many different flavors. i am using what i believe to be PERL regexp
syntax, and it's not working.


0

The ORO documentation is at
http://jakarta.apache.org/oro/api/org/apache/oro/text/regex/package-summary.html.

Downloading OROMatcher from
http://www.savarese.org/oro/downloads/#OROMatcher yields that documentation
plus the following disclaimer:

It is beyond the scope of this guide to give a detailed explanation of
regular expressions to beginners. The OROMatcher TM package is geared toward
programmers who are already familiar with regular expressions, having used
them with other languages, and who now want to apply them in their Java
programs. However, we shall make a small attempt to cover the basics and
summarize the Perl5 syntax supported by the OROMatcher TM Perl5 classes. For
a detailed exploration of regular expressions for both beginners and
advanced users, we recommend the book Mastering Regular Expressions by
Jeffrey Friedl published by O'Reilly & Associates.


:),
-Dave


0

Danny,
So is it now true that you are seeing "Bad Pattern" dialogs whenever a caret
appears in your regular expression? If so, this is sort of good news --
that is what is happening to me. Bad news is, it appears to be a bug
introduced after 3.0.2. I tried to submit a bug report (I think it
succeeded -- long story.)

I hate to tell you to revert to 3.0.2, but if you did, I think my regex
would work for you. Then you could also test "find in path" to see if it
works there. I seem to recollect also having trouble with dissimilar
behavior between find in path vs. find in file.

-Dave

"Danny van der Rijn" <dannyv@tibco.com> wrote in message
news:24345816.1053038862522.JavaMail.jrun@is.intellij.net...

something interesting to report. i was doing my searches in the "find in

path" dialog, because i wanted them anywhere. the "find" dialog behaves
differently. telling me when it doesn't like my patterns, for instance. so
now the trick is to find out what version of [^"] doesn't yield a "bad
pattern" dialog. haven't figured it out yet.
>

as for the google pointers, the problem is that, as you point out, there

are many different flavors. i am using what i believe to be PERL regexp
syntax, and it's not working.


0

ok, i feel better that at least it's a known bug, and i'm not crazy. in that respect, anyway.

0

We are using oromatcher, please check out their docs.


Sorry, but this seems to be a non-IDEAish approach. It would be better to
have some support when entering regular expressions, for instance a popup
button right beside the input field. This would make the live much easier
for people who are not experienced in regular expressions or only use them
sporadically. Unfortunately my corresponding rfe is considered low
priority.

Tom


On Fri, 16 May 2003 02:54:50 +0400, Eugene Belyaev <beg@intellij.com>
wrote:

We are using oromatcher, please check out their docs.

>

--

>

Eugene Belyaev, CTO
JetBrains, Inc
http://www.intellij.com
"Develop with pleasure!"

>
>

"Danny van der Rijn" <dannyv@tibco.com> wrote in message
news:24345816.1053038862522.JavaMail.jrun@is.intellij.net...

>> something interesting to report. i was doing my searches in the "find
>> in

path" dialog, because i wanted them anywhere. the "find" dialog behaves
differently. telling me when it doesn't like my patterns, for instance.
so
now the trick is to find out what version of [^"] doesn't yield a "bad
pattern" dialog. haven't figured it out yet.

>>
>> as for the google pointers, the problem is that, as you point out, there

are many different flavors. i am using what i believe to be PERL regexp
syntax, and it's not working.

>
>
>


0

it also doesn't make sense to me. what you're in effect saying is "we support perl5 regular expressions, documented elsewhere, except for the part of it that we don't support, unless there's a bug in the code."

0

Thomas Singer wrote:
>> We are using oromatcher, please check out their docs.


Sorry, but this seems to be a non-IDEAish approach. It would be better
to have some support when entering regular expressions, for instance a
popup button right beside the input field. This would make the live much
easier for people who are not experienced in regular expressions or only
use them sporadically. Unfortunately my corresponding rfe is considered
low priority.

Tom


It will become just a little bit better when the switch over to Java's
regular expressions, and that is planned for Aurora--
http://www.intellij.net/tracker/idea/viewSCR?publicId=8075
I've been preaching lately that find & replace need a revisiting, and
your rfe is an important example.

Jon

0

Danny,

I received this reply from support@intellij.org after submitting the bug
report about carets not working in 3.0.4. Looks like we'll need to wait for
next version. I will give it a try in build 815.

-Dave



Current IDEA Find subsystem is unable to properly work with line start and
end patterns, it sometimes hangs IDEA. That is why this functionality was
disabled. In the next IDEA version it will be rewritten from scratch.

Serge Baranov
JetBrains, Inc
http://www.intellij.com
"Develop with pleasure!"


0

i don't need start and end, but if they took caret out for it's "start" functionality, but forgot that it also negates character classes, that would explain it. btw, doesn't work in 815 either.

0

Yes, it appears they totally disabled parsing of the caret, so has the side
effect of ruining negated character sets.


0

Please sign in to leave a comment.