Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help capturng unicoded directory names
#21
(7 hours ago)doppler Wrote: Thanks steve for coming up with a QB64pe only solution.  And I realize it's the console window that stays open, allowing code page to be found (or changed) by multiple shell's.  I will experiment with different unicode pages in the console window.  To see if they can be identified separately.  I suspect they can.

This has to be referenced in the wiki, in the Shell and Console:Only as foot notes.  I can't be the only one stumped by this.

Thanks again

My external solution works too, but not as elegantly as yours.  I hate having to rely on third part programs.
If it can't be done with QB64pe.  Then keep bashing until it can.

With the CHCP 65001 page, you shouldn't have to worry about "different unicode pages".  The whole point of unicode is basically to make every character available for you at the same time.   With ASCII and ANSI you only have 128 or 256 characters available for use.  Almost all of them have the same 128-ANSI characters, and you swap out various code pages for the 129-256 character range that you're using.  Unicode doesn't hold that same limit and thus isn't something we have to fret over so much.  Just set that console/terminal to CHCP 65001 and you're good to go.  (As long as your font has the characters you're looking for inside it.  Not all unicode fonts hold every possible character set.)

I'm glad this works for you.  Wink

Note that you probably don't need a $CONSOLE:ONLY line to get it working.  $CONSOLE and then a _CONSOLE ON set of commands would probably work just as well, since you'd still have that same persistent console to make changes to.  The problem with using SHELL without any console is the command runs, a console opens up, the command finishes, and then the console closes and the changes aren't necessarily persistent for the next SHELL issued.

Use of $CONSOLE:ONLY or $CONSOLE and _CONSOLE ON should keep everything in the same console and let you make the changes you need to get back the information you're looking for with DIR.  Smile
Reply
#22
I read that every unicode text should have a BOM at the beginning, which should be its identification - and the code page should probably be set accordingly. I'm trying to find out more about it now.


Reply
#23
(3 hours ago)Petr Wrote: I read that every unicode text should have a BOM at the beginning, which should be its identification - and the code page should probably be set accordingly. I'm trying to find out more about it now.
Get the program Hexplorer here: https://sourceforge.net/projects/hexplor...t/download
It's from 2018, old and still works dam good.  Watch out you can modify file content.

To understand unicode ID bytes: https://en.wikipedia.org/wiki/Byte_order_mark

Good luck
Reply




Users browsing this thread: Petr, 2 Guest(s)