Qbasicnews.com
May 26, 2020, 05:29:36 PM *
Welcome, Guest. Please login or register.

Login with username, password and session length
News: Back to Qbasicnews.com | QB Online Help | FAQ | Chat | All Basic Code | QB Knowledge Base
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: Parsing Source to Tokens for Interpreter  (Read 2356 times)
wizardlife
Na_th_an
*****
Posts: 1456


WWW
« on: February 14, 2005, 12:50:14 AM »

Hi again everyone -- I need to borrow someone's brain for a bit.

This is actually a VB question, but I rather think that more eyeballs see this forum than the General Programming one buried all the way down at the bottom.

Okay, it's a parser for C-style code. I already have it breaking the source into tokens as

- Numeric Constants
- Identifiers
- Operators
- Parenthesis
- Braces
- Semicolons

The tokens are stored in a Collection Object. The second stage introduces the CodeBlock token, which is simply a token containing a pointer to a further block of tokens. Any braces are consumed and the tokens inside them are moved into sub-collections, forming a tree.

What I'm having trouble with is what I percieve to be the next phase -- figure out where the expressions are and turn them into a similar nested-tree approach, where each node basically contains two pointers to subnodes, and the operator that connects them.

I'd like it to recognize order of operations, but I'm finding it all rather overwhelming. Without a lot of computer science background or formal training, is there a good site that explains the logic process for turning these linear tokens into a meaningful tree?

A quick Google just revealed a bunch of philisophical guff about RPN. Anyone want to recommend a practical resource?

Thanks!

Mike

ps. The language that I'm dealing with doesn't need to account for string constants or compiler directives (eg, macros).
Logged

Mac
Senior Member
**
Posts: 243


WWW
FSA
« Reply #1 on: February 14, 2005, 01:52:15 AM »

You might try the Finite State Automata model. See example:
http://www.network54.com/Forum/message?forumid=178387&messageid=1096906004

Mac
Logged
wizardlife
Na_th_an
*****
Posts: 1456


WWW
« Reply #2 on: February 15, 2005, 10:51:41 PM »

Quote from: "Mac"
You might try the Finite State Automata model. See example:
http://www.network54.com/Forum/message?forumid=178387&messageid=1096906004

Mac


Hmm... Thanks for the example. I find it rather scary, and it still doesn't properly observe order of operations.

I found an actual VB one, that I believe uses a similar state scheme, and does observe order of op, but it's equally incomprehensible. I'd really like this to integrate cleanly into my current framework, so I just need to buckle down and figure out what on earth it is that it does.

Anyhow, thanks!
Logged

LooseCaboose
I hold this place together
*****
Posts: 981



« Reply #3 on: February 16, 2005, 08:38:38 AM »

The dragon book. Compilers: Principles, techniques and tools by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman. ISBN: 0-201-10088-6

Quote from: "wizardlife"

and it still doesn't properly observe order of operations


Usually you construct the grammar to deal with order of operations, for example:
Code:

expr ::= term | expr + term | expr - term

term ::= factor | term * factor | term / factor

factor ::= id | num


When you parse an expression like:
Code:

5 + 3 * 4


It is computed as:
Code:

5 + (3 * 4)
Logged

esus saves.... Passes to Moses, shoots, he scores!
na_th_an
*/-\*
*****
Posts: 8244



WWW
« Reply #4 on: February 16, 2005, 09:17:30 AM »

Aho, Sethi, Ullman... What huge nightmares they gave me at college Cheesy but yeah, that's the very source to go.

How will you be building the grammar? I've found that the easiest way to do it is (when you are new to this thingo) building a LL(1) authomata.
Logged

SCUMM (the band) on Myspace!
ComputerEmuzone Games Studio
underBASIC, homegrown musicians
[img]http://www.ojodepez-fanzine.net/almacen/yoghourtslover.png[/i
wizardlife
Na_th_an
*****
Posts: 1456


WWW
« Reply #5 on: February 16, 2005, 05:13:26 PM »

Quote from: "LooseCaboose"
Usually you construct the grammar to deal with order of operations...


Well, I'm not using any fancy lex/yacc/[insert grammar tool here] stuff. It's just home-cooked tokenizing. I considered trying to use one of those established tools by wrapping it in an activeX control and dropping it on the form, but in the end decided that it was overkill given the task at hand.

LL(1) = ??

Since I'm obviously missing key theory here, I think I'll do my best to understand the existing code, but end up basically adapting it to my needs with minimal modification. I'll post a link when I get home from work.
Logged

LooseCaboose
I hold this place together
*****
Posts: 981



« Reply #6 on: February 16, 2005, 09:00:56 PM »

Quote from: "wizardlife"

Well, I'm not using any fancy lex/yacc/[insert grammar tool here] stuff.


Regardless of what tools you use to construct a lexer/parser with, you need a grammar, it should be the first thing you do. The grammar I gave you allows you to set operator precedences for addition, subtraction, multiplication and division. After tokenisations (lexer) you can parse the grammar I gave you (roughly, see below) with a simple recursive descent parser:

Code:

sub expr
  term

  while current = SYM_PLUS or current = SYM_MINUS
    term
  wend
end sub

sub term
  factor

  while current = SYM_TIMES or current = SYM_DIVIDE
    factor
  wend
end sub

sub factor
  if current <> SYM_ID and current <> SYM_NUM then
    print_error "Expected identifier or constant"
  end if
end sub


LL(1) stands for Left to right parsed, producing a Leftmost derivation and using 1 token for lookahead. This means that you parse the grammar from left to right, and that the leftmost non-terminal will be replaced at each step. Lookahead is the number of tokens you need to be able to see to correctly parse the grammar, with 1 lookahead you only need to know the current token.

The grammar that I gave you isn't actually LL(1) because it contains left recursion, but you can rewrite left recursive grammars so they can be parsed by an LL(1) parser.

You are going to need at least a little bit of grammar theory to be able to do this, and IMHO you are better off with a book than some web tutorial. You need to construct an LL(1) grammar for the language you are trying to parse, then implementing a lexer and parser are easy.
Logged

esus saves.... Passes to Moses, shoots, he scores!
Blitz
I hold this place together
*****
Posts: 853



WWW
« Reply #7 on: February 17, 2005, 02:50:56 PM »

A recursive descent top down parser is the way to go if your coding by hand. Using state tables will be a bitch when you want o debug. Try getting your language down in EBNF form then translating it to code. That part is a peice of cake.

simple expression parser:

Code:
expr        := addexpr
addexpr     := mulexpr ( ( '+' | '-' ) mulexpr )*
mulexpr     := parexpr ( ( '*' | '/' ) parexpr )*
parexpr     := '(' expr ')'
             | atom
atom        := numlit
             | id


Try to convert that to recursive code, peice of cake.  * means 0 or more, | means or
Logged

oship me and i will give you lots of guurrls and beeea
wizardlife
Na_th_an
*****
Posts: 1456


WWW
« Reply #8 on: February 17, 2005, 07:54:32 PM »

Quote
You are going to need at least a little bit of grammar theory to be able to do this, and IMHO you are better off with a book than some web tutorial. You need to construct an LL(1) grammar for the language you are trying to parse, then implementing a lexer and parser are easy.

Honestly, that is just way beyond the scope the project. I've got state-machine code that does what I need it to do, so I've just bolted that on to the rest of the parser. This program just isn't complicated enough to warrant such a sophisticated approach. It doesn't need to be extensible in any kind of way; they're just simple 5-10 line scripts that are executed at run-time. It's not a compiler, just an extremely simpleminded interpreter.

The theory interests me, but there are other aspects of the program that require my attention. If it was a personal project, I'd feel differently.

But thanks for all your help, everyone.


btw, neat to look around here again, haven't been by in ages. Looks like the regular crop of homegrown OSes in the Projects forum and Do-My-Homework threads in the Help forum, lol.
Logged

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines Valid XHTML 1.0! Valid CSS!