Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Testing for extended unicode?
#7
If your goal is just to know if it's english, or better regular 7-bit ASCII (0-127) then you just need to identify the UTF-8 markers. Everything which is not a UTF-8 sequence is automatically pure ASCII.

I use such a check in the code which renders the Wiki help text in the IDE, it's basically as follows:
Code: (Select All)
'UTF-8 handling
text$ = "whatever you get from your input"
FOR currPos% = 1 TO LEN(text$)
    seq$ = MID$(text$, currPos%, 4) '   'get next 4 chars (becomes less 4 at the end of text$)
    seq$ = seq$ + SPACE$(4 - LEN(seq$)) 'fill missing chars with space (safety for ASC())
    IF (((ASC(seq$, 1) AND &HE0~%%) = 192) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) THEN
        '2-byte UTF-8
    ELSEIF (((ASC(seq$, 1) AND &HF0~%%) = 224) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) AND ((ASC(seq$, 3) AND &HC0~%%) = 128) THEN
        '3-byte UTF-8
    ELSEIF (((ASC(seq$, 1) AND &HF8~%%) = 240) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) AND ((ASC(seq$, 3) AND &HC0~%%) = 128) AND ((ASC(seq$, 4) AND &HC0~%%) = 128) THEN
        '4-byte UTF-8
    ELSE
        '1st char of seq$ = regular ASCII
    END IF
NEXT
Reply


Messages In This Thread
Testing for extended unicode? - by tothebin - 03-23-2023, 06:01 AM
RE: Testing for extended unicode? - by RhoSigma - 03-23-2023, 08:29 AM
RE: Testing for extended unicode? - by tothebin - 03-23-2023, 09:48 PM
RE: Testing for extended unicode? - by mnrvovrfc - 03-24-2023, 01:13 AM
RE: Testing for extended unicode? - by tothebin - 03-24-2023, 02:15 PM
RE: Testing for extended unicode? - by RhoSigma - 03-24-2023, 04:43 PM
RE: Testing for extended unicode? - by tothebin - 03-24-2023, 10:01 PM
RE: Testing for extended unicode? - by mnrvovrfc - 03-24-2023, 10:26 PM
RE: Testing for extended unicode? - by tothebin - 03-24-2023, 10:43 PM



Users browsing this thread: 2 Guest(s)