Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Tokenizer in QB64
#21
100% agree with mdijkens and RhoSigma. I have been caveating all along yesterday when this was discussed that the code is poorly written but didn't have time to dig in and clean up because I wasn't clear on all it's goals. So I went the way of quick fix by making a wrapper for Asc() when I learned that was way Asc("") just returned 0 from where Aurel's code came from, Oxygen Basic. Thanks to RhoSigma I learn it would work have worked for Chr$(0) but who uses that?

But Paul Does solution was more elegant just quit the function once you have no string left to check, no wrapper needed much more effecient. And I liked his solution for checking if empty string better than what I used first.

If all this is, is run through a string and classify and count types should be a piece of cake to clean up. Might have even taken less time than yak about quick fixes! If this is for QB64 then we dont even need to worry about left and right brackets.
b = b + ...
Reply
#22
This is a pretty hot topic right now, it would seem. This tokenizer... is it similar to my tokenize() function that I used to try peddling around? I used strtok() to drive my stuff.
Tread on those who tread on you

Reply
#23
@SpriggsySpriggs I am unfamiliar with token systems, have I summerized the goal correctly:
read a string and count chars of interest into a (Shared?) array?

So input string and chars of interest, output array of chars and their counts < dont even have to share this!

BTW allot of code savy guys seem to frown upon sharing?

I think I prefer sharing specially with arrays we can't output as functions.
I don't know the overhead involved under the hood to pass an array to routine.
Is it worth adding another parameter to the user defined routine?

Seems to me the sharing question is first thing to resolve in cleaning up this code.

Also Aurel makes a huge list of constants for each char, why not make 1 string of chars, their position in string associates to their index number so if first char is (
then CharList$ = "("

That works only if single chars not longer words but I don't think he uses words (more than 1 char).
b = b + ...
Reply
#24
My function was very different, then. I was splitting a string using a list of provided characters and then storing them in an array that you pass in to fill. I didn't do any sharing and didn't do any counting. Mine was basically nothing but a wrapper for strtok().
Tread on those who tread on you

Reply
#25
Quote:100% agree with mdijkens and RhoSigma
 Heh ..of course i don't agree with them because both of them
mdijkens ..tend to patronize like a python programmer
and RhoSigma just mocking  Tongue ..( also i don't use QB64 very often)
and  don't know origin of code ...
 origin is written in Oxygen Basic by me(and only me)
..which is in fact assembler
and as such produce more efficient ..read ---> more speed with IF-s
than with SELECT case structure
Also i don't know which approach is faster in QB64 ..
i don't tested it yet with large amount of code
for example with Erik S.I.C.K interpreter which is large enough.
Reply
#26
Quote:Seems to me the sharing question is first thing to resolve in cleaning up this code.
 what you call sharing ..i called GLOBAL
because whole tokenizer is part of Interpreter  arrays must be GLOBAL
In main part of program tokenizer function is called
when source string is loaded as file or in this TESTING case just as string.
Every char in source must be processed not just string of interest.
Reply
#27
Quote:That works only if single chars not longer words but I don't think he uses words (more than 1 char).
what you called WORDS i called TOKEN
when tokenizer recognize word as IDENT then token is created
Reply
#28
Yes i found Erik SIC64 have

24080 lines of code ....that is a file Smile 

but i will start with something smaller like Ed subQBInterpreter code
cca 3000 lines


Attached Files Image(s)
   
Reply
#29
OK Aurel basically your tokenizer is going through a string of characters and counting special ones and the characters of interest are all of length 1? Correct?
b = b + ...
Reply
#30
Yes each char must be "tested" and of course length of char is 1

PRINT ...

P
R
I
N
T

gives token PRINT -- token type IDENT  -> tokTyp = 1
Reply




Users browsing this thread: 6 Guest(s)