Thursday, September 10, 2009

cl-ppcre and replacing strings

So, while I am digging into the macro that is (time), here is a quick note on using cl-ppcre in replacing text in a string.

First simple example.
Let's take a string, break it into separate words with cl-ppcre:split
and return two values, the last word and everything except the last word.
This function is definitely not the most efficient or correct, but will
do for the example.


(let* ((str "John and Susan Q. Public-and-Private")
(last-name (first (last (cl-ppcre:split "\\s+" str))))
(first-name (cl-ppcre:regex-replace last-name str "")))
(values first-name last-name)


More complicated example
Assume we have already defined a variable which holds a text string of legal text which has references to law cites. I want to insert a link around the code sections, but the only actual link we want is the section number itself. We'll use the regex that we used in the last entry as our code-regex pattern.


(cl-ppcre:regex-replace-all code-regex sample-text
'("<a href=\"Laws-display?name=>" :match "\">" :match "</a>"))

"<a href=\"Laws-display?name=>Section 882(a) \">Section 882(a) </a>imposes US tax on a foreign
corporation engaged in a trade or business within the US on its income which is effectively
connected with the conduct of a trade or business inside the US.
<a href=\"Laws-display?name=>Section 864(c)(3). \">Section 864(c)(3). </a>All income from
sources within the United States shall be treated as effectively connected with the conduct of a trade
or business within the United States. Treas. Reg. 1.867-7(a) states that income from the
purchase and sale of personal property shall be treated as derived entirely from the country in
which the property is sold. Treas. Reg. 1.867-7(c) states that a sale of personal property is
consummated at the time when, and the place where, the rights, title and interest of the seller in
the property are transferred to the buyer. <a href=\"Laws-display?name=Section 865(b)
\">Section 865(b) </a>In the case of income derived from the sale of inventory property, such
income shall be sourced under the rules of <a href=\"Laws-display?name=>sections 861(a)(6),
\">sections 861(a)(6), </a><a href=\"Laws-display?name=>section 862(a)(6) \">section 862(a)(6)
</a>and <a href=\"Laws-display?name=>section 863.\">section 863.</a>

<a href=\"Laws-display?name=>Section 861(a)(6) \">Section 861(a)(6) </a>treats inventory
purchased outside the US and sold in the US as US source income. <a href=\"Laws-
display?name=>Section 862(a)(6) \">Section 862(a)(6) </a>treats inventory purchased inside the
US and sold outside the US as foreign source income. <a href=\"Laws-display?name=>Section 863(b)
\">Section 863(b) </a>would allow a split for inventory produced by the taxpayer inside the US
and sold outside the US. <a href=\"Laws-display?name=>Section 865(e)(2)(A) \">Section 865(e)
(2)(A) </a>states that if a nonresident maintains an fixed place of business inside the US., any
sale of inventory attributable to that fixed place of business is sourced in the US regardless
of where the sale occurs. <a href=\"Laws-display?name=>Section 865(e)(2)(B) \">Section 865(e)
(2)(B) </a>states that (A) does not apply if an office of the taxpayer in a foreign country
materially participates in the sale."

This gets us part way there in that we now have a link around the match. However, we really want only the first number and not the word "section". We will want the subsections later, but need to see if we can call a function first. How about a different approach?


(cl-ppcre:regex-replace-all code-regex
sample-text
#'(lambda (match &rest registers)
(concatenate 'string (first registers) "<a href=\"Laws-display?name=" (second registers) "\">"
(second registers) "</a>"))
:simple-calls t)
[Error message about match not being used]

"Section <a href=\"Laws-display?name=882\">882</a>imposes US tax on a foreign corporations
engaged in a trade or business within the US on its income which is effectively connected with
the conduct of a trade or business inside the US. Section <a href=\"Laws-display?name=864
\">864</a>All income from sources within the United States shall be treated as effectively
connected with the conduct of a trade or business within the United States. Treas. Reg.
1.867-7(a) states that income from the purchase and sale of personal property shall be treated
as derived entirely from the country in which the property is sold. Treas. Reg. 1.867-7(c)
states that a sale of personal property is consummated at the time when, and the place where,
the rights, title and interest of the seller in the property are transferred to the buyer.
Section <a href=\"Laws-display?name=865\">865</a>In the case of income derived from the sale of
inventory property, such income shall be sourced under the rules of sections <a href=\"Laws-
display?name=861\">861</a>section <a href=\"Laws-display?name=862\">862</a>and section <a
href=\"Laws-display?name=863\">863</a>Section <a href=\"Laws-display?name=861\">861</a>treats
inventory purchased outside the US and sold in the US as US source income. Section <a
href=\"Laws-display?name=862\">862</a>treats inventory purchased inside the US and sold outside
the US as foreign source income. Section <a href=\"Laws-display?name=863\">863</a>would allow a
split for inventory produced by the taxpayer inside the US and sold outside the US. Section <a
href=\"Laws-display?name=865\">865</a>states that if a nonresident maintains an fixed place of
business inside the US., any sale of inventory attributable to that fixed place of business is
sourced in the US regardless of where the sale occurs. Section <a href=\"Laws-display?name=865
\">865</a>states that (A) does not apply if an office of the taxpayer in a foreign country
materially participates in the sale."


Now it looks better.

No comments:

Post a Comment