I've started looking at the Syntax Parsing Framework and I have a few questions.
Regarding the EBNF format, I'm quite lost. The only sample in 12.2 that I found was the XML one and the ones form the CTP don't work anymore with 12.2. What would be the minimal EBNF to be able to parse a couple of words in a sentence let's say, Hello and World?
The sample directs you to create a cs file from an EBNF file. Would it be possible to load an EBNF file directly without creating a class first? I ask this because our grammar is based on a series of what we call "fields" and users can add their own to this list.
Thanks
Guy B.
Hey Guy, thanks for your feedback on the CTP version of the syntax parsing framework. I’m sorry that you had a problem with the samples. Which ones were giving you trouble? There was a bit of code churn before the CTP went out the door and it’s possible something which was working was broken accidentally. As far as your questions, I believe the attached file answers both of them (but let me know if it does not). The file dynamically loads in an EBNF grammar definition (which is a constant string, but could easily be generated based on customer preferences) and creates a CustomLanguage instance representing it. The grammar itself has a start symbol named Document, which represents zero or more Sentence symbols. Each Sentence represents one or more Phrase symbols followed by a punctuation symbol. And multiple instances of Phrase are separated by other types of punctuation. A Phrase represents one or more words. I have also included a short recursive routine to print the final parse tree to the debug window, which based on the input “Hello World! This sentence has multiple phrases; they are separated by a semicolon.” prints the following:
Document
Sentence
Phrase
WordToken: Hello
WordToken: World
SentenceEndingPunctuation
ExclamationPointToken: !
WordToken: This
WordToken: sentence
WordToken: has
WordToken: multiple
WordToken: phrases
PhraseSeparatingPunctuation
SemicolonToken: ;
WordToken: they
WordToken: are
WordToken: separated
WordToken: by
WordToken: a
WordToken: semicolon
DotToken: .
$:
If you have any questions on the grammar or anything else, let me know and I will try to answer them as best as possible.
Also, I should warn you that the format of the special sequences in the EBNF format has not been finalized and may change when the parsing framework is brought to RTM status.
Another thing to note is that you can also create a Grammar instance by populating the terminal and non-terminal symbols collections with objects representing the symbols manually. This removes the need for EBNF files, which is just one way to provide grammatical information to the parsing framework. However, I can tell you that we have already made some breaking changes in this area, so again, keep that in mind.
And finally, if you haven’t already, you might want to check out my blog where I have been discussing the parsing framework: http://ko.infragistics.com/community/blogs/mike_dour/
Hi Mike. Thanks for the quick response.
I was under the impression that the Syntax Parsing Framework was no longer CTP with the 12.2 release. Am I wrong?
The samples that I had problems with were the ones posted in other threads in this forum. They no longer work with 12.2.
Thanks for your sample, I was able to achieve what I wanted to try out. To be honest, the EBNF syntax is quite daunting (a super regex on steroids) and I found very little help on the Web. While having Infragistics use a standard like this is fantastic, I think a few EBNF intro concepts articles like you posted on your blog would help Infragistics clients.
The original intent was to release the parsing framework in 12.2, but as the 12.2 release was being finalized we decided to ship it as a CTP again because there were some aspects, such as error reporting, that did not meet the quality level we expect, and some essential features, such as grammar ambiguity detection, did not get implemented in time.
There were many changes from the 12.1 to 12.2 CTPs, so it’s possible that is why the samples you’re using no longer work. Try using the samples included with the installed product. Those should work.
Yes I agree that EBNF could be a little complex for people if it is new to them. I do intend on writing a blog post or two introducing EBNF and context-free grammars, but I wanted to stay away from specifics relating to our EBNF format until it has been finalized. I don’t want to lead people in the wrong direction. For now, you may want to check out the articles below and if you have any questions, feel free to ask on this forum and I will try to answer as best I can.
http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form
http://en.wikipedia.org/wiki/Context-free_grammar