Discussion:
syntax error in plperl wrapper for Perl regex replace.
io.sys
2018-08-23 17:20:05 UTC
Permalink
Hi,

I am trying to create a simple wrapper to use Perl regexp replace capabilites, which
supports \L (aka "lowercase from here on") in a replacement-strings, not available in
postgre native regexp functions.

Based on this article:
https://www.anchor.com.au/blog/2013/09/extending-postgresql-high-level-languages-cats/
I copy-pasted the function code with a little adjustment, leading to:

----------
create or replace function perl_regexp_replace (text, text, text, text)
returns text as $funcbody$

$_=$_[0];
s/$_[1]/$_[2]/$_[3];
return $_;

$funcbody$ language plperl immutable;
----------

When running the above code, I get a syntax error in the line "s/..".

When I remove this line, the function compiles, but (of course) does not do anything
usefull.

So, what is wrong with line:
s/$_[1]/$_[2]/$_[3];

Thank in advance for reply.

R.Golis, Prague.
io.sys
2018-08-24 01:32:49 UTC
Permalink
William,

Thank you for your reply.

I am familiar with "s" command of sed and with regular expressions in general. However, I
am not familiar with Perl.

My attempt to use the "s" command in my function is exactly the way you describe, i.e:

s/regex_to_match/replacement-string-referencing-captured-groups/optional-options

I just tried to use function parameters $_[1], $_[2], $_[3] instead fixed strings.
However, it is rejected by postgre as syntax error.

Why? Must be some trivial, novice error.

R.Golis.

####################################
The correct syntax for Perl regex substitutions is
s/pattern_to_match/string_to_replace_matches_with/
After the third slash, you may add a single letter to indicate special
options, like g for global (matches however many times the pattern occurs
in the string) or i for case insensitivity. However, you may at most have
three of those slashes in your expression, with one substitution at a time.
There are ways to do multiple substitutions by using the default variables
for pattern matches, but it might be beyond the scope of your
time/usefulness. You may still want to have a look at
https://perldoc.perl.org/perlretut.html for details.
-WTJ
Tom Lane
2018-08-24 02:17:10 UTC
Permalink
Post by io.sys
I just tried to use function parameters $_[1], $_[2], $_[3] instead fixed strings.
However, it is rejected by postgre as syntax error.
That's not a Postgres problem, it's a Perl problem. I'm no Perl
expert, but I think if you study the Perl docs carefully you will
find there are restrictions on what sorts of variable references
can be interpolated into a regexp ... and a subscripted reference
probably doesn't qualify.

regards, tom lane
io.sys
2018-08-24 23:55:47 UTC
Permalink
Tom,

You are right, it is a Perl's fault, not Postgre's.

You are right also with assumption, that the "s" command of Perl does not accept variable
in every of it's part. I have narrowed down the problem (btw, I could not find anything
about it is Perl docs) to the options part, ie. after last slash. Perl only accepts literals
there, not a variable of any kind. So these options must be hardcoded, not a dynamic value
constructed at runtime.

This compiles:
s/$_[1]/$_[2]/ig;

But even then, the command does not work as expected, as it does not interpret the
backreferences in a replacement string passed to it. It uses it literally. For example,
one would expect that the following call will return 'Praha':

select perl_regexp_replace ('PRAHA', '^(.)(.*)$', '$1\L$2');

But it returns:
$1\L$2

Well, Perl is praised for it's capabilities at the field of text processing, but this
experience disappointed me a little bit. Never mind, I will solve the things different way.

Thank you all for your inputs.

Regards, Roman Golis.

####################################
Post by Tom Lane
Post by io.sys
I just tried to use function parameters $_[1], $_[2], $_[3] instead fixed strings.
However, it is rejected by postgre as syntax error.
That's not a Postgres problem, it's a Perl problem. I'm no Perl
expert, but I think if you study the Perl docs carefully you will
find there are restrictions on what sorts of variable references
can be interpolated into a regexp ... and a subscripted reference
probably doesn't qualify.
regards, tom lane
Cédric Godin
2018-08-30 09:20:49 UTC
Permalink
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#FFFFFF" bgcolor="#330033">
<div class="moz-cite-prefix">Le 25/08/18 à 01:55, io.sys a écrit :</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Not being a perl expert either I think
the problem lies in the regex line.</div>
<div class="moz-cite-prefix">You should eval it to have the expected
result (and you can use your 4th parameter again then).</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">eval "s/$_[1]/$_[2]/$_[3]";</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Regards, Cédric<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<blockquote type="cite" cite="mid:***@aps.gogo">
<pre class="moz-quote-pre" wrap="">Tom,

You are right, it is a Perl's fault, not Postgre's.

You are right also with assumption, that the "s" command of Perl does not accept variable
in every of it's part. I have narrowed down the problem (btw, I could not find anything
about it is Perl docs) to the options part, ie. after last slash. Perl only accepts literals
there, not a variable of any kind. So these options must be hardcoded, not a dynamic value
constructed at runtime.

This compiles:
s/$_[1]/$_[2]/ig;

But even then, the command does not work as expected, as it does not interpret the
backreferences in a replacement string passed to it. It uses it literally. For example,
one would expect that the following call will return 'Praha':

select perl_regexp_replace ('PRAHA', '^(.)(.*)$', '$1\L$2');

But it returns:
$1\L$2

Well, Perl is praised for it's capabilities at the field of text processing, but this
experience disappointed me a little bit. Never mind, I will solve the things different way.

Thank you all for your inputs.

Regards, Roman Golis.


</pre>
</blockquote>
</body>
</html>
io.sys
2018-08-31 04:06:32 UTC
Permalink
Hi, Cédric.

Works! Thank you very much for the hint. Apparently, you are much more
experienced with Perl, than me.

Just for reference, my working function wrapper for Perl regexp replace
now looks like this:

----------
create or replace function perl_regexp_replace (text, text, text, text)
returns text as $funcbody$

$_=$_[0];
eval "s/$_[1]/$_[2]/$_[3]";
return $_;

$funcbody$ language plperl immutable strict;
----------

Regards, Roman.


####################################
Not being a perl expert either I think the problem lies in the regex
line. You should eval it to have the expected result (and you can use
your 4th parameter again then).
eval "s/$_[1]/$_[2]/$_[3]";
Regards, Cédric
Loading...