Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help capturng unicoded directory names
#1
I am using Shell _Hide "dir /o/b > thelist4.txt" capture contents of a directory.  The need is to change names and move files.

What explorer sees is "Vol.09 Ch.0041 (en) [一人の新しい Scan]"
What the shell capture sees is "Vol.09 Ch.0041 (en) [?????? Scan]"

I know qb64pe could handle the unicoded names just fine as strings.  I just need to capture the true unicode name.  "dir" commandline is not up to the task.
Ever once in awhile I get the urge to find an answer.  Now is one of those times.

To put a twist on the request.  It needs to automated, meaning not dropping out run a program and manually capture the content.
Ideas ?
Thanks
Reply
#2
First thing you have to check for is that your console is set up using an unicode font.  (Say Lucidia Console Font.)  If the font doesn't support unicode, you aren't getting much further than that.

Second is to set the console properties to UTF-8.

Now you're set to do a simple SHELL "dir > filelist.txt" and get those unicode characters.

You may be able to do this with a batch file, if your console is set with the proper font.

Code: (Select All)
CHCP 65001
DIR > MyFileName.TXT

Save the above as your batch file, and that CHCP 65001 will set you to an UTF-8 character page which the DIR will then use for your MyFileName.TXT file creation.
Reply
#3
[Image: image.png]

Click the image above to expand it.  It may be another quick and easy solution for you, if you're on a windows machine.

To enable the new UTF-8 option in Windows settings:

Go to the language settings, click Administrative language settings, then Change system locale… and tick the Beta: Use Unicode UTF-8 for worldwide language support option.

You can basically see what you need to be looking for with the image above. Once that little box is ticked, it might solve the issue for you all by itself, if your shell is shelling to Powershell instead of CMD. I don't know what OS you're using, so thought I'd include it as an option as well for you. Wink
Reply
#4
(Note:  May require a system reboot for that setting change to take place from the post above.)
Reply
#5
Total commander sees the unicode special characters. Power shell or dir does not.  Did all the above and was able to change the cmd prompt links on my desktop to use Lucida Console Font.  All shell captures did not work.  Even the desktop cmd links links opened did not sees the special characters.

I did find every invocation of cmd.exe and maybe power shell uses the location or short cut link to define what font is used.  re: cmd.exe in windows directory uses default font.  Change to lucida, then loads lucida next execution.  But execute a unmodified desktop link, it's back to normal.

I found using lucida or even utf-8 is not the problem.  The correct code page must be used before displaying the character.  Total commander seems to realize this and adjusts it's display.  Even for korean, Chinese, Japanese and English etc ... all in one directory.

This is one stubborn problem.

----------------- investigation of registry------------
default font is defined by keys in Computer\HKEY_CURRENT_USER\Console
Under the string keyname: FaceName is the default font used.

Each key located in Console is unique to it's start location.  I can have any number of cmd.exe links defined with a different font/codepage to each one.

Still a stubborn problem
Reply
#6
Have you tried the abbreviated DIR listing in the format DIR /x or better yet, using DirEntry.h? In my language, this will print a meaningless name (without using _MapUnicode), but it can be used to access a file on disk.


Reply
#7
Can you zip up a file in an archive and name it something like test.7z?  (To make certain the name is maintained 100% and not altered by the forums or downloading.)

It's hard to test various things without a suitable file for experimentation on.
Reply
#8
This is a time where Win32 would come in handy for displaying such characters. I have code somewhere for displaying Asian characters. Where it is...who knows?

EDIT: DIR /X was my next suggestion. Short names work just fine.
Tread on those who tread on you

Reply
#9
Yeah, so direntry.h can't handle Chinese. How did I figure that out without knowing Chinese? I entered the name in Czech in translate.google.com, had it translated into Chinese, copied the Chinese expression, renamed the JPG file to the Chinese expression, added the extension .JPG, and tried to run it in my program that I'm currently developing. The result that Direntry.h returned is ??.jpg, which is not a valid file name.


Reply
#10
Now, if all you need is the ability to read a file list that contains Unicode, then I can help. I have code for an Open File dialog that uses Unicode.
Tread on those who tread on you

Reply




Users browsing this thread: Petr, 3 Guest(s)