Table of Contents
Introduction
Recently I was working with OpenAPI Generator . The task at hand was to take an OpenAPI v3 spec file and run the generator on it to produce an OpenAPI client for Python.
The generator also produced test files. However, the actual lines in test files were commented (disabled), so to make the tests run the comment characters had to be removed.
As there were ~350 files, I was certainly going to run a one-liner for this job.
This article describes a very simple and powerful way to perform these (or arbitrary other) text replacements in multiple files.
The Problem
OpenAPI Generator has produced around 350 test files, all with the same general content. Here is an example of one, in which only the structure is important:
# coding: utf-8
"""
API for managing a service
The version of the OpenAPI document: 1.0
Generated by OpenAPI Generator (https://openapi-generator.tech)
Do not edit the class manually.
""" # noqa: E501
import unittest
import datetime
from the_sdk.the_api.models.action import Action # noqa: E501
class TestAction(unittest.TestCase):
"""Action unit test stubs"""
def setUp(self):
pass
def tearDown(self):
pass
def make_instance(self, include_optional) -> Action:
"""Test Action
include_option is a boolean, when False only required
params are included, when True both required and
optional params are included """
# uncomment below to create an instance of `Action`
"""
model = Action() # noqa: E501
if include_optional:
return Action(
value = 'GET'
)
else:
return Action(
value = 'GET',
)
"""
def testAction(self):
"""Test Action"""
# inst_req_only = self.make_instance(include_optional=False)
# inst_req_and_optional = self.make_instance(include_optional=True)
if __name__ == '__main__':
unittest.main()
(If you want to follow the tutorial hands-on, feel free to save this file locally and even make multiple copies of it, to simulate having more than one file.)
As we can see, the important lines in testAction
mentioning make_instance
were commented and they would not execute.
So the primary task was to go through all the files and remove comments (lines beginning with #
).
The Solution
Preparation
When designing a solution for problems of this type, the steps are usually:
-
Determine what is the general class of the problem we are solving. (In our case, it is “making automated replacements in multiple files at once”)
-
Decide which tool or approach to use for the actual text file edits. (In our case, we are going to show using Perl )
-
Decide how to select the files which should be changed. (Simple in this case – just
tests/*.py
) -
Figure out how to review the changes, undo them, and try the improved procedure again if needed. (Explained in more detail below)
-
Carry out the task
Tools
-
Perl is a scripting language unsurpassed in its ability to make complex text filtering and editing tasks in only a couple of lines of code, often even in a single line called a
oneliner
. So we are going to use Perl for editing. (We could also use old-school Unix languages like AWK or sed , but Perl can do all they can do and much more.) -
To select the files which should be transformed, we don’t even have to think about it – since we want to apply the transformation on all Python files in the directory
tests/
, we are simply going to specifytests/*.py
. -
And in terms of verifying our changes, we are going to place the directory
tests/
under temporary Git revision control, and then we will be able to rungit diff
later to inspect the changes. In the case of incorrect edits, we will be able to just rungit reset --hard
to undo the changes and try again.
Action
We will enter the directory tests/
and quickly add it under local/temporary Git version control:
cd tests
git init
git add .
git commit -am "Initial"
Then we are going to run a Perl oneliner to carry out replacements in the text.
perl -pi -e's/^(\s*)# /$1/g' *.py
Running this line basically already solved the task, which shows how powerful Unix text editing capabilities are. But we are going to continue discussing it and show 2 more iterations for the benefit of this tutorial.
First, a couple explanations of the above line:
-
Option
-p
causes Perl to run the script (either a script file or an in-place oneliner) on every record (which by default is every line) of the input file(s). At the end of script (that is, at the end of every line), it implicitly prints out the resulting line. (For the same behavior but without printing, one would use option-n
instead.) -
Option
-i
instructs Perl to do in-place edits of files. That means we will be reading from files, and thanks to the mentioned option-p
we will automatically be writing transformed lines back to the files -
Option
-e
specifies the Perl script in-place, without requiring it to be in a separate file -
And the part
's/(\s*)# /$1/g'
is our actual script. It uses regular expressions and says:-
Using regular expressions, substitute all occurrences (
s///g
) -
Of zero or more whitespace characters followed by a literal
#
((\s*)#
) -
With only that whitespace (to preserve it), but without the
#
($1
)
-
-
And do so for all files matching glob pattern
*.py
Review
The above line has executed instantly, and we need to review the results for correctness.
We can review the changes simply by running git diff
. It will show us a series of changes in a format called unified diff
where lines beginning with -
indicate lines changed, and lines with +
indicate their replacements:
git diff
@@ -1,4 +1,4 @@
-# coding: utf-8
+coding: utf-8
"""
HAProxy Fusion Control Plane
@@ -32,7 +32,7 @@
include_option is a boolean, when False only required
params are included, when True both required and
optional params are included """
- # uncomment below to create an instance of `Action`
+ uncomment below to create an instance of `Action`
"""
model = Action() # noqa: E501
if include_optional:
@@ -47,8 +47,8 @@
def testAction(self):
"""Test Action"""
- # inst_req_only = self.make_instance(include_optional=False)
- # inst_req_and_optional = self.make_instance(include_optional=True)
+ inst_req_only = self.make_instance(include_optional=False)
+ inst_req_and_optional = self.make_instance(include_optional=True)
if __name__ == '__main__':
unittest.main()
Right off the bat we see that our script has unexpectedly removed the #
in front of lines like # coding: utf-8
. We did not intend to modify those lines.
We notice that this pattern is present in all files right at the beginning of the line (without any whitespace preceding it), while the edits we actually want to make are always prefixed with some whitespace.
So we are going to revert our changes and repeat the replacements, but this time expecting not “zero or more whitespace” but “one or more whitespace” in front of #
.
Action
We reset the files to their original:
git reset --hard
And we specify “one or more” instead of “zero or more” in regular expressions
by changing *
to +
:
perl -pi -e's/^(\s+)# /$1/g' *.py
Review
Edits have again been completed instantly, and we need to review them.
git diff
@@ -32,7 +32,7 @@
include_option is a boolean, when False only required
params are included, when True both required and
optional params are included """
- # uncomment below to create an instance of `Action`
+ uncomment below to create an instance of `Action`
"""
model = Action() # noqa: E501
if include_optional:
@@ -47,8 +47,8 @@
def testAction(self):
"""Test Action"""
- # inst_req_only = self.make_instance(include_optional=False)
- # inst_req_and_optional = self.make_instance(include_optional=True)
+ inst_req_only = self.make_instance(include_optional=False)
+ inst_req_and_optional = self.make_instance(include_optional=True)
if __name__ == '__main__':
unittest.main()
The original problem has been fixed, but in the output we now see one more unintended edit. It is in the textual/commented part of the file rather than in the actual code.
Because this part of the text is found in a Python string (denoted by """..."""
) and serves as an unimportant comment, we could even leave it as-is and be happy with our replacements.
However, strictly speaking, this was an unintended edit, and apart from being clumsy it is also doubling the size of our diff, so we want to avoid making those changes.
We need another iteration.
Action
We reset the files to their original:
git reset --hard
And we modify our oneliner to only make the replacements if the comment (#
) does not begin with the word “uncomment”.
We do this by using one of basic regular expressions features called a “negative lookahead assertion
”, signified by (?!...)
. It specifies text that must not follow the matched part:
perl -pi -e's/^(\s+)# (?!uncom)/$1/g' *.py
Review
We do another git diff
and verify that the results look exactly like we wanted:
git diff
@@ -47,8 +47,8 @@
def testAction(self):
"""Test Action"""
- # inst_req_only = self.make_instance(include_optional=False)
- # inst_req_and_optional = self.make_instance(include_optional=True)
+ inst_req_only = self.make_instance(include_optional=False)
+ inst_req_and_optional = self.make_instance(include_optional=True)
if __name__ == '__main__':
unittest.main()
At this point we can save the diff to a file if needed (git diff > activate-tests.patch
) and remove the local/temporary Git directory we have created in the directory tests/
(rm -rf .git
).
The patch file can then be copied and applied to any tree of unmodified tests/
files. This would be done with a command such as patch -p1 < activate-tests.patch
, ran from within the directory tests/
.
Quiz
A “diff” or a difference between two files always needs two states (old and new) to show the actual differences.
We have conveniently used Git for this purpose – for each file, git diff
showed us a comparison between the last committed state in Git (in .git/
) and the current/actual content of the file on disk.
But could we have used something else?
Yes. Instead of using Git and git diff
to show diffs between the state saved in the temporary .git/
directory and the latest state on disk, we could have made a copy of the tests/
directory, such as cp -a tests tests,orig
.
Then we could have checked for differences between all files in that directory and files in our changed/updated directory with a command such as diff -ruNP ../tests,orig/ ./
.
(Note that diff
shown here is a standalone program, and not an option/subcommand of git
. In fact, diff
is the original program; git diff
only mimics the functionality and output of diff -u
.)
If we used that approach, how would the undo procedure look like?
In that case, we could undo the changes by deleting our directory tests/
and copying it from tests,orig/
again (cd .. && rm -rf tests && cp -a tests,orig tests && cd tests
).
Another creative option would be to produce the diff
output and apply it, but in the reverse direction (from old to new, instead from new to old). We would use a command such as diff -ruNP ../tests,orig/ ./ | patch -Rp1
for that.
Finally, is there an alternative way in which we could have solved not modifying the # coding: utf-8
lines?
Yes. If you recall from the explanations above, we have identified that # coding: utf-8
is always found at the beginning of line, while all our intended edits are prefixed by some whitespace.
So our solution was to simply require that one or more whitespace characters were found before the #
.
However, we could have also noticed that # coding: utf-8
is also always found as the first line in the file.
So instead of the approach we used, we could have told Perl to only do the replacements if the current line in the file is greater than 1 ($INPUT_LINE_NUMBER > 1
or $. > 1
for short in Perl notation).
And we could have done that by adding an if
that places a condition on the line number in each file: perl -pi -e's/^(\s*)# (?!uncom)/$1/g if $. > 1' *.py
Further Reading
This article is part of the following series:
1. hands-on
Automatic Links
The following links appear in the article:
1. AWK - https://en.wikipedia.org/wiki/AWK
2. Regular Expressions - https://en.wikipedia.org/wiki/Regular_expressions
3. Sed - https://en.wikipedia.org/wiki/Sed
4. OpenAPI Generator - https://openapi-generator.tech/
5. Negative Lookahead Assertion - https://perldoc.perl.org/perlre#Lookaround-Assertions
6. Unified Diff - https://www.gnu.org/software/diffutils/manual/html_node/Unified-Format.html
7. Perl - https://www.perl.org/