Login

**SMcNeill** · 08-27-2022, 06:37 AM

I've got much better word lists and dictionaries, if you need something like that. The reason I tend to just use the one I chose here is simply due to its sheer size and number of entries. It makes a good baseline for timed tests to see how long it takes to load and process something. For just pure *words*, I'd suggest to just download and use the Official Scrabble Dictionary.txt. Wink

PhilOfPerth · 08-27-2022, 08:02 AM

(08-27-2022, 06:37 AM)SMcNeill Wrote: I've got much better word lists and dictionaries, if you need something like that. The reason I tend to just use the one I chose here is simply due to its sheer size and number of entries. It makes a good baseline for timed tests to see how long it takes to load and process something. For just pure *words*, I'd suggest to just download and use the Official Scrabble Dictionary.txt.

I have, and I am. This one just caught my attention as the Scrabble one is about 280000 words. I sub-divided the Scrabble one into 26 files so I can call the appropriate file when checking words, to save search time. Smile

SpriggsySpriggs · (This post was last modified: 08-31-2022, 02:18 PM by SpriggsySpriggs.)

Something I just noticed with this is that if you are assuming that the line endings are CHR$(13) + CHR$(10) ("\r\n") then that might not work with a file that has UNIX line endings, which I think is just CHR$(10) ("\n"). You might want to split on just CHR$(10) and then check for CHR$(13) existing after the split. If it does, you can just delete those. A foolproof way that I split a file up is by using my tokenize function, which uses strtok. It takes a list of characters to split on and it works just fine regardless of the file having UNIX or Windows line endings.

**SMcNeill** · 08-31-2022, 03:01 PM

(08-31-2022, 02:14 PM)Spriggsy Wrote: Something I just noticed with this is that if you are assuming that the line endings are CHR$(13) + CHR$(10) ("\r\n") then that might not work with a file that has UNIX line endings, which I think is just CHR$(10) ("\n"). You might want to split on just CHR$(10) and then check for CHR$(13) existing after the split. If it does, you can just delete those. A foolproof way that I split a file up is by using my tokenize function, which uses strtok. It takes a list of characters to split on and it works just fine regardless of the file having UNIX or Windows line endings.

Code: (Select All)
    'we want to auto-detect our CRLF endings

    'as we have the file in temp$ at the moment, we'll just search for it via instr

    If InStr(temp$, Chr$(13) + Chr$(10)) Then

        MemFile(i).CRLF = Chr$(13) + Chr$(10)

    ElseIf InStr(temp$, Chr$(10)) Then

        MemFile(i).CRLF = Chr$(10)

    ElseIf InStr(temp$, Chr$(13)) Then

        MemFile(i).CRLF = Chr$(13)

    Else

        Error 5: Exit Function

    End If

It searches your file to see what type of line endings you have in it. Unless you have mixed endings, (like some end with CHR$(10) and others end with CHR$(13), it'll work automagically for you. If you have mixed endings, you'll probably need to write a routine to normalize to one format or the other, before making use of these functions. I didn't want to tie up the INPUT times by having them do a series of IF checks to see if you have a 10, 13, or 1310 set of endings on each line. I was going a little more for speed and efficiency, which should work for 99.9% of most files, than flexibility to make certain we can read every mixed-ending file out there. Wink

RhoSigma · (This post was last modified: 12-05-2023, 03:50 PM by RhoSigma.)

(08-31-2022, 02:14 PM)Spriggsy Wrote: Something I just noticed with this is that if you are assuming that the line endings are CHR$(13) + CHR$(10) ("\r\n") then that might not work with a file that has UNIX line endings, which I think is just CHR$(10) ("\n"). You might want to split on just CHR$(10) and then check for CHR$(13) existing after the split. If it does, you can just delete those. A foolproof way that I split a file up is by using my tokenize function, which uses strtok. It takes a list of characters to split on and it works just fine regardless of the file having UNIX or Windows line endings.

If you want some kind the "Luxury Version" of this, then you may take The Simplebuffer System (Read Docs) from my Libraries Collection Big Grin

It's basically the same thing but build on a string array rather then _MEM. It also has a lot more functionality and is able to handle the line endings as mentioned above.

Login
Username/Email:
Password:	Lost Password?
	Remember me