All Versions
85
Latest Version
4.7
Avg Release Cycle
48 days
Latest Release
559 days ago

Changelog History
Page 3

  • v2.3.1 Changes

    January 01, 2019
    • πŸš€ POSSIBLE API CHANGE: this release fixes a bug when results names were attached to a MatchFirst or Or object containing an And object. Previously, a results name on an And object within an enclosing MatchFirst or Or could return just the first token in the And. Now, all the tokens matched by the And are correctly returned. This may result in subtle changes in the tokens returned if you have this condition in your pyparsing scripts.

    • πŸ†• New staticmethod ParseException.explain() to help diagnose parse exceptions by showing the failing input line and the trace of ParserElements in the parser leading up to the exception. explain() returns a multiline string listing each element by name. (This is still an experimental method, and the method signature and format of the returned string may evolve over the next few releases.)

    Example: # define a parser to parse an integer followed by an # alphabetic word expr = pp.Word(pp.nums).setName("int") + pp.Word(pp.alphas).setName("word") try: # parse a string with a numeric second value instead of alpha expr.parseString("123 355") except pp.ParseException as pe: print(pp.ParseException.explain(pe))

    Prints: 123 355 ^ ParseException: Expected word (at char 4), (line:1, col:5) main.ExplainExceptionTest pyparsing.And - {int word} pyparsing.Word - word

    explain() will accept any exception type and will list the function names and parse expressions in the stack trace. This is especially useful when an exception is raised in a parse action.

    Note: explain() is only supported under Python 3.

    • πŸ›  Fix bug in dictOf which could match an empty sequence, making it infinitely loop if wrapped in a OneOrMore.

    • βž• Added unicode sets to pyparsing_unicode for Latin-A and Latin-B ranges.

    • βž• Added ability to define custom unicode sets as combinations of other sets using multiple inheritance.

      class Turkish_set(pp.pyparsing_unicode.Latin1, pp.pyparsing_unicode.LatinA): pass

      turkish_word = pp.Word(Turkish_set.alphas)

    • ⚑️ Updated state machine import examples, with state machine demos for: . traffic light . library book checkin/checkout . document review/approval

    In the traffic light example, you can use the custom 'statemachine' keyword to define the states for a traffic light, and have the state classes auto-generated for you:

      statemachine TrafficLightState:
          Red -> Green
          Green -> Yellow
          Yellow -> Red
    

    Similar for state machines with named transitions, like the library book state example:

      statemachine LibraryBookState:
          New -(shelve)-> Available
          Available -(reserve)-> OnHold
          OnHold -(release)-> Available
          Available -(checkout)-> CheckedOut
          CheckedOut -(checkin)-> Available
    

    Once the classes are defined, then additional Python code can reference those classes to add class attributes, instance methods, etc.

    See the examples in examples/statemachine

    • βž• Added an example parser for the decaf language. This language is used in CS compiler classes in many colleges and universities.

    • πŸ›  Fixup of docstrings to Sphinx format, inclusion of test files in the source package, and convert markdown to rst throughout the distribution, great job by MatΔ›j Cepl!

    • Expanded the whitespace characters recognized by the White class to include all unicode defined spaces. Suggested in Issue #51 by rtkjbillo.

    • βž• Added optional postParse argument to ParserElement.runTests() to add a custom callback to be called for test strings that parse successfully. Useful for running tests that do additional validation or processing on the parsed results. See updated chemicalFormulas.py example.

    • βœ‚ Removed distutils fallback in setup.py. If installing the package fails, please update to the latest version of setuptools. Plus overall project code cleanup (CRLFs, whitespace, imports, etc.), thanks Jon Dufresne!

    • πŸ›  Fix bug in CaselessKeyword, to make its behavior consistent with Keyword(caseless=True). Fixes Issue #65 reported by telesphore.

  • v2.3.0 Changes

    October 01, 2018
    • πŸ†• NEW SUPPORT FOR UNICODE CHARACTER RANGES This release introduces the pyparsing_unicode namespace class, defining a series of language character sets to simplify the definition of alphas, nums, alphanums, and printables in the following language sets: . Arabic . Chinese . Cyrillic . Devanagari . Greek . Hebrew . Japanese (including Kanji, Katakana, and Hirigana subsets) . Korean . Latin1 (includes 7 and 8-bit Latin characters) . Thai . CJK (combination of Chinese, Japanese, and Korean sets)

    For example, your code can define words using:

    korean_word = Word(pyparsing_unicode.Korean.alphas)
    

    See their use in the updated examples greetingInGreek.py and greetingInKorean.py.

    This namespace class also offers access to these sets using their unicode identifiers.

    • πŸ“œ POSSIBLE API CHANGE: Fixed bug where a parse action that explicitly returned the input ParseResults could add another nesting level in the results if the current expression had a results name.

      vals = pp.OneOrMore(pp.pyparsing_common.integer)("int_values")
      
      def add_total(tokens):
          tokens['total'] = sum(tokens)
          return tokens  # this line can be removed
      
      vals.addParseAction(add_total)
      print(vals.parseString("244 23 13 2343").dump())
      

    Before the fix, this code would print (note the extra nesting level):

    [244, 23, 13, 2343]
    - int_values: [244, 23, 13, 2343]
      - int_values: [244, 23, 13, 2343]
      - total: 2623
    - total: 2623
    

    With the fix, this code now prints:

    [244, 23, 13, 2343]
    - int_values: [244, 23, 13, 2343]
    - total: 2623
    

    This fix will change the structure of ParseResults returned if a program defines a parse action that returns the tokens that were sent in. This is not necessary, and statements like "return tokens" in the example above can be safely deleted prior to upgrading to this release, in order to avoid the bug and get the new behavior.

    Reported by seron in Issue #22, nice catch!

    • πŸ›  POSSIBLE API CHANGE: Fixed a related bug where a results name erroneously created a second level of hierarchy in the returned ParseResults. The intent for accumulating results names into ParseResults is that, in the absence of Group'ing, all names get merged into a common namespace. This allows us to write:

      key_value_expr = (Word(alphas)("key") + '=' + Word(nums)("value")) result = key_value_expr.parseString("a = 100")

    and have result structured as {"key": "a", "value": "100"} instead of [{"key": "a"}, {"value": "100"}].

    However, if a named expression is used in a higher-level non-Group expression that also has a name, a false sub-level would be created in the namespace:

        num = pp.Word(pp.nums)
        num_pair = ("[" + (num("A") + num("B"))("values") + "]")
        U = num_pair.parseString("[ 10 20 ]")
        print(U.dump())
    

    Since there is no grouping, "A", "B", and "values" should all appear at the same level in the results, as:

        ['[', '10', '20', ']']
        - A: '10'
        - B: '20'
        - values: ['10', '20']
    

    Instead, an extra level of "A" and "B" show up under "values":

        ['[', '10', '20', ']']
        - A: '10'
        - B: '20'
        - values: ['10', '20']
          - A: '10'
          - B: '20'
    

    This bug has been fixed. Now, if this hierarchy is desired, then a Group should be added:

        num_pair = ("[" + pp.Group(num("A") + num("B"))("values") + "]")
    

    Giving:

        ['[', ['10', '20'], ']']
        - values: ['10', '20']
          - A: '10'
          - B: '20'
    

    But in no case should "A" and "B" appear in multiple levels. This bug-fix fixes that.

    If you have current code which relies on this behavior, then add or remove Groups as necessary to get your intended results structure.

    Reported by Athanasios Anastasiou.

    • πŸ“œ IndexError's raised in parse actions will get explicitly reraised as ParseExceptions that wrap the original IndexError. Since IndexError sometimes occurs as part of pyparsing's normal parsing logic, IndexErrors that are raised during a parse action may have gotten silently reinterpreted as parsing errors. To retain the information from the IndexError, these exceptions will now be raised as ParseExceptions that reference the original IndexError. This wrapping will only be visible when run under Python3, since it emulates "raise ... from ..." syntax.

    Addresses Issue #4, reported by guswns0528.

    • βž• Added Char class to simplify defining expressions of a single character. (Char("abc") is equivalent to Word("abc", exact=1))

    • βž• Added class PrecededBy to perform lookbehind tests. PrecededBy is used in the same way as FollowedBy, passing in an expression that must occur just prior to the current parse location.

    For fixed-length expressions like a Literal, Keyword, Char, or a Word with an exact or maxLen length given, PrecededBy(expr) is sufficient. For varying length expressions like a Word with no given maximum length, PrecededBy must be constructed with an integer retreat argument, as in PrecededBy(Word(alphas, nums), retreat=10), to specify the maximum number of characters pyparsing must look backward to make a match. pyparsing will check all the values from 1 up to retreat characters back from the current parse location.

    When stepping backwards through the input string, PrecededBy does not skip over whitespace.

    PrecededBy can be created with a results name so that, even though it always returns an empty parse result, the result can include named results.

    Idea first suggested in Issue #30 by Freakwill.

    • ⚑️ Updated FollowedBy to accept expressions that contain named results, so that results names defined in the lookahead expression will be returned, even though FollowedBy always returns an empty list. Inspired by the same feature implemented in PrecededBy.
  • v2.2.2 Changes

    September 01, 2018
    • πŸ›  Fixed bug in SkipTo, if a SkipTo expression that was skipping to an expression that returned a list (such as an And), and the SkipTo was saved as a named result, the named result could be saved as a ParseResults - should always be saved as a string. Issue #28, reported by seron.

    • Added simple_unit_tests.py, as a collection of easy-to-follow unit tests for various classes and features of the pyparsing library. Primary intent is more to be instructional than actually rigorous testing. Complex tests can still be added in the unitTests.py file.

    • πŸ†• New features added to the Regex class:

      • optional asGroupList parameter, returns all the capture groups as a list
      • optional asMatch parameter, returns the raw re.match result
      • new sub(repl) method, which adds a parse action calling re.sub(pattern, repl, parsed_result). Simplifies creating Regex expressions to be used with transformString. Like re.sub, repl may be an ordinary string (similar to using pyparsing's replaceWith), or may contain references to capture groups by group number, or may be a callable that takes an re match group and returns a string.

      For instance: expr = pp.Regex(r"([Hh]\d):\s*(.*)").sub(r"<\1>\2</\1>") expr.transformString("h1: This is the title")

      will return This is the title

    • πŸ›  Fixed omission of LICENSE file in source tarball, also added CODE_OF_CONDUCT.md per GitHub community standards.

  • v2.2.1 Changes

    September 01, 2018
    • πŸ“œ Applied changes necessary to migrate hosting of pyparsing source over to GitHub. Many thanks for help and contributions from hugovk, jdufresne, and cngkaygusuz among others through this transition, sorry it took me so long!

    • πŸ›  Fixed import of collections.abc to address DeprecationWarnings in Python 3.7.

    • ⚑️ Updated oc.py example to support function calls in arithmetic expressions; fixed regex for '==' operator; and added packrat parsing. Raised on the pyparsing wiki by Boris Marin, thanks!

    • πŸ“œ Fixed bug in select_parser.py example, group_by_terms was not reported. Reported on SF bugs by Adam Groszer, thanks Adam!

    • βž• Added "Getting Started" section to the module docstring, to guide new users to the most common starting points in pyparsing's API.

    • πŸ›  Fixed bug in Literal and Keyword classes, which erroneously raised IndexError instead of ParseException.

  • v2.2.0 Changes

    March 01, 2017
    • ⬆️ Bumped minor version number to reflect compatibility issues with OneOrMore and ZeroOrMore bugfixes in 2.1.10. (2.1.10 fixed a bug that was introduced in 2.1.4, but the fix could break code written against 2.1.4 - 2.1.9.)

    • ⚑️ Updated setup.py to address recursive import problems now that pyparsing is part of 'packaging' (used by setuptools). Patch submitted by Joshua Root, much thanks!

    • πŸ›  Fixed KeyError issue reported by Yann Bizeul when using packrat parsing in the Graphite time series database, thanks Yann!

    • πŸ›  Fixed incorrect usages of '\' in literals, as described in https://docs.python.org/3/whatsnew/3.6.html#deprecated-python-behavior Patch submitted by Ville SkyttΓ€ - thanks!

    • Minor internal change when using '-' operator, to be compatible with ParserElement.streamline() method.

    • πŸ“œ Expanded infixNotation to accept a list or tuple of parse actions to attach to an operation.

    • πŸ†• New unit test added for dill support for storing pyparsing parsers. Ordinary Python pickle can be used to pickle pyparsing parsers as long as they do not use any parse actions. The 'dill' module is an extension to pickle which does support pickling of attached parse actions.

  • v2.1.10 Changes

    October 01, 2016
    • πŸ›  Fixed bug in reporting named parse results for ZeroOrMore expressions, thanks Ethan Nash for reporting this!

    • πŸ›  Fixed behavior of LineStart to be much more predictable. LineStart can now be used to detect if the next parse position is col 1, factoring in potential leading whitespace (which would cause LineStart to fail). Also fixed a bug in col, which is used in LineStart, where '\n's were erroneously considered to be column 1.

    • βž• Added support for multiline test strings in runTests.

    • πŸ›  Fixed bug in ParseResults.dump when keys were not strings. Also changed display of string values to show them in quotes, to help distinguish parsed numeric strings from parsed integers that have been converted to Python ints.

  • v2.1.9 Changes

    September 01, 2016
    • βž• Added class CloseMatch, a variation on Literal which matches "close" matches, that is, strings with at most 'n' mismatching characters.

    • πŸ›  Fixed bug in Keyword.setDefaultKeywordChars(), reported by Kobayashi Shinji - nice catch, thanks!

    • πŸ“œ Minor API change in pyparsing_common. Renamed some of the common expressions to PEP8 format (to be consistent with the other pyparsing_common expressions): . signedInteger -> signed_integer . sciReal -> sci_real

    Also, in trying to stem the API bloat of pyparsing, I've copied some of the global expressions and helper parse actions into pyparsing_common, with the originals to be deprecated and removed in a future release: . commaSeparatedList -> pyparsing_common.comma_separated_list . upcaseTokens -> pyparsing_common.upcaseTokens . downcaseTokens -> pyparsing_common.downcaseTokens

    (I don't expect any other expressions, like the comment expressions, quotedString, or the Word-helping strings like alphas, nums, etc. to migrate to pyparsing_common - they are just too pervasive. As for the PEP8 vs camelCase naming, all the expressions are PEP8, while the parse actions in pyparsing_common are still camelCase. It's a small step - when pyparsing 3.0 comes around, everything will change to PEP8 snake case.)

    • πŸ›  Fixed Python3 compatibility bug when using dict keys() and values() in ParseResults.getName().

    • βœ… After some prodding, I've reworked the unitTests.py file for pyparsing over the past few releases. It uses some variations on unittest to handle my testing style. The test now: . auto-discovers its test classes (while maintining their order of definition) . suppresses voluminous 'print' output for tests that pass

  • v2.1.8 Changes

    August 01, 2016
    • Fixed issue in the optimization to _trim_arity, when the full stacktrace is retrieved to determine if a TypeError is raised in pyparsing or in the caller's parse action. Code was traversing the full stacktrace, and potentially encountering UnicodeDecodeError.

    • πŸ›  Fixed bug in ParserElement.inlineLiteralsUsing, causing infinite loop with Suppress.

    • πŸ›  Fixed bug in Each, when merging named results from multiple expressions in a ZeroOrMore or OneOrMore. Also fixed bug when ZeroOrMore expressions were erroneously treated as required expressions in an Each expression.

    • βž• Added a few more inline doc examples.

    • πŸ‘Œ Improved use of runTests in several example scripts.

  • v2.1.7 Changes

    August 01, 2016
    • πŸ›  Fixed regression reported by Andrea Censi (surfaced in PyContracts tests) when using ParseSyntaxExceptions (raised when using operator '-') with packrat parsing.

    • Minor fix to oneOf, to accept all iterables, not just space-delimited strings and lists. (If you have a list or set of strings, it is not necessary to concat them using ' '.join to pass them to oneOf, oneOf will accept the list or set or generator directly.)

  • v2.1.6 Changes

    August 01, 2016
    • ⬆️ Major packrat upgrade, inspired by patch provided by Tal Einat - many, many, thanks to Tal for working on this! Tal's tests show faster parsing performance (2X in some tests), and memory reduction from 3GB down to ~100MB! Requires no changes to existing code using packratting. (Uses OrderedDict, available in Python 2.7 and later. For Python 2.6 users, will attempt to import from ordereddict backport. If not present, will implement pure-Python Fifo dict.)

    • πŸ‘ Minor API change - to better distinguish between the flexible numeric types defined in pyparsing_common, I've changed "numeric" (which parsed numbers of different types and returned int for ints, float for floats, etc.) and "number" (which parsed numbers of int or float type, and returned all floats) to "number" and "fnumber" respectively. I hope the "f" prefix of "fnumber" will be a better indicator of its internal conversion of parsed values to floats, while the generic "number" is similar to the flexible number syntax in other languages. Also fixed a bug in pyparsing_common.numeric (now renamed to pyparsing_common.number), integers were parsed and returned as floats instead of being retained as ints.

    • πŸ›  Fixed bug in upcaseTokens and downcaseTokens introduced in 2.1.5, when the parse action was used in conjunction with results names. Reported by Steven Arcangeli from the dql project, thanks for your patience, Steven!

    • πŸ‘€ Major change to docs! After seeing some comments on reddit about general issue with docs of Python modules, and thinking that I'm a little overdue in doing some doc tuneup on pyparsing, I decided to following the suggestions of the redditor and add more inline examples to the pyparsing reference documentation. I hope this addition will clarify some of the more common questions people have, especially when first starting with pyparsing/Python.

    • πŸ—„ Deprecated ParseResults.asXML. I've never been too happy with this method, and it usually forces some unnatural code in the parsers in order to get decent tag names. The amount of guesswork that asXML has to do to try to match names with values should have been a red flag from day one. If you are using asXML, you will need to implement your own ParseResults->XML serialization. Or consider migrating to a more current format such as JSON (which is very easy to do: results_as_json = json.dumps(parse_result.asDict()) Hopefully, when I remove this code in a future version, I'll also be able to simplify some of the craziness in ParseResults, which IIRC was only there to try to make asXML work.

    • ⚑️ Updated traceParseAction parse action decorator to show the repr of the input and output tokens, instead of the str format, since str has been simplified to just show the token list content.

    (The change to ParseResults.str occurred in pyparsing 2.0.4, but it seems that didn't make it into the release notes - sorry! Too many users, especially beginners, were confused by the "([token_list], {names_dict})" str format for ParseResults, thinking they were getting a tuple containing a list and a dict. The full form can be seen if using repr().)

    For tracing tokens in and out of parse actions, the more complete repr form provides important information when debugging parse actions.

    Verison 2.1.5 - June, 2016

    • βž• Added ParserElement.split() generator method, similar to re.split(). Includes optional arguments maxsplit (to limit the number of splits), and includeSeparators (to include the separating matched text in the returned output, default=False).

    • βž• Added a new parse action construction helper tokenMap, which will apply a function and optional arguments to each element in a ParseResults. So this parse action:

      def lowercase_all(tokens): return [str(t).lower() for t in tokens] OneOrMore(Word(alphas)).setParseAction(lowercase_all)

    can now be written:

      OneOrMore(Word(alphas)).setParseAction(tokenMap(str.lower))
    

    Also simplifies writing conversion parse actions like:

      integer = Word(nums).setParseAction(lambda t: int(t[0]))
    

    to just:

      integer = Word(nums).setParseAction(tokenMap(int))
    

    If additional arguments are necessary, they can be included in the call to tokenMap, as in:

      hex_integer = Word(hexnums).setParseAction(tokenMap(int, 16))
    
    • βž• Added more expressions to pyparsing_common: . IPv4 and IPv6 addresses (including long, short, and mixed forms of IPv6) . MAC address . ISO8601 date and date time strings (with named fields for year, month, etc.) . UUID (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) . hex integer (returned as int) . fraction (integer '/' integer, returned as float) . mixed integer (integer '-' fraction, or just fraction, returned as float) . stripHTMLTags (parse action to remove tags from HTML source) . parse action helpers convertToDate and convertToDatetime to do custom parse time conversions of parsed ISO8601 strings

    • βœ… runTests now returns a two-tuple: success if all tests succeed, and an output list of each test and its output lines.

    • βž• Added failureTests argument (default=False) to runTests, so that tests can be run that are expected failures, and runTests' success value will return True only if all tests fail as expected. Also, parseAll now defaults to True.

    • πŸ†• New example numerics.py, shows samples of parsing integer and real numbers using locale-dependent formats:

      4.294.967.295,000 4 294 967 295,000 4,294,967,295.000