Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Splitting REALLY *REALLY* long lines into single character + CR + LF
#1
I am trying to work with a text file of pi that is one single line of one *billion* digits. QB64 Phoenix doesn't seem to like putting this one long line into a string variable. It just sits there with one CPU maxed out. Maybe it might do something eventually, but not in any reasonable time frame.

Is there an easy way using Linux to convert this long line into a file in the following format?

3
.
1
4
1
5
9

etc?

That should make it much easier to deal with.
While we are at it, what is the maximum size file that can be opened for reading?
Reply
#2
https://stuff.mit.edu/afs/sipb/contrib/pi/    <-- You can get a billion digits of Pi from here.

A billion digits loads in about 0.3 seconds for me.  Smile

Code: (Select All)
t# = Timer(0.001)
a$ = _ReadFile$("z:\pi-billion.txt")
Print Len(a$)
Print Left$(a$, 1000)
Print
Print Using "##.### seconds to read file."; Timer - t#

   
Reply
#3
I wouldn't bother trying to put this into a `STRING`, QB64 simply does not handle `STRING`s of that size very well (or at all, in some cases). There are ways you can make it happen but you won't be able to do anything non-trivial with it as most `STRING` operations result in the `STRING` getting copied, which is a big problem if it's a 1GB string.

A better idea is to create a 1GB `_MEM` and handle the data that way. Doing anything with the `_MEM` after you read the data is of course trickier than using a `STRING`, but ultimately you were never going to be able to use the regular `STRING` functions anyway.

Alternatively you could consider whether you actually need the whole thing in memory at one time. If you can adapt what you're doing to act on chunks of Pi at a time then you can process it without ever needing a 1GB `STRING`.
Reply
#4
(04-22-2025, 04:34 AM)Circlotron Wrote: Is there an easy way using Linux to convert this long line into a file in the following format?

3
.
1
4
1
5
9

etc?
Use the fold command?

For 1 character per line: fold -w1 file1.txt > file2.txt
Reply
#5
(04-22-2025, 04:34 AM)Circlotron Wrote: I am trying to work with a text file of pi that is one single line of one *billion* digits. QB64 Phoenix doesn't seem to like putting this one long line into a string variable. It just sits there with one CPU maxed out. Maybe it might do something eventually, but not in any reasonable time frame.

Is there an easy way using Linux to convert this long line into a file in the following format?

3
.
1
4
1
5
9

etc?

That should make it much easier to deal with.
While we are at it, what is the maximum size file that can be opened for reading?





I don't think there is a practical maximum on NTFS or EXT4 file systems.   On FAT32 the largest file that can exist is 4gb's  


I would just grab that file into a Byte Array.     
DIM NUMBYTES AS _UNSIGNED _INT64

OPEN "PIFILE" FOR BINARY AS #1
NUMBYTES = LOF(1)
REDIM PIARRAY(0 TO NUMBYTES-1) AS _UNSIGNED _BYTE

GET #1,, PIARRAY()



'  That part gets the file.    If PI is an ASCII string then you can determine the place value of each digit by subtracting 48 from the Byte value.

PIARRAY(1) should be the decimal point IF the file has no leading spaces.
Reply
#6
DIM I AS _UNSIGNED _INT64
PIARRAY(0) = 3

FOR I = 2 TO NUMBYTES - 1
        PIARRAY(I) = PIARRAY(I) - 48
NEXT


That converts each byte ASCII value to it's numeric value.
Reply
#7
Okay, thanks for all that. Not really familiar with arrays, so about to learn as I go.
Reply
#8
(04-22-2025, 04:44 AM)SMcNeill Wrote: https://stuff.mit.edu/afs/sipb/contrib/pi/    <-- You can get a billion digits of Pi from here.
[/qb]

Yeah, I got it from the same place.
Reply
#9
It depends what/how you want to process this? What is your usecase?
In general the initial format you have is a lot more powerfull then what you suggest (put crlf between all characters)
QB64 can handle this easily and depending on your scenario, you could put it in a _MEM (unlimited) or fixed length string (up to 2GB) or process it by reading blocks of 64Mb (which I do with 100GB+ files)
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Reply
#10
That _READFILE function is fabulous! I've never used it before. I'm reading in the single line, running through it with a FOR-NEXT loop around MID$ and looking for sequences that match the decimal place position. e.g. at position 16,470 there is the sequence 16470. There are several like that. Haven't quite got it working just yet.

I'll have a look at the _MEM thing too.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [split] Lines of Code bplus 20 2,935 01-25-2025, 05:11 PM
Last Post: Pete
  very long pathnames and _DirExists (prob ?) doppler 2 797 12-31-2023, 05:04 PM
Last Post: doppler
  Integer (math) Single = Single SMcNeill 0 539 10-20-2023, 12:16 AM
Last Post: SMcNeill
  tweak Str$ for single and double Jack 9 1,713 12-01-2022, 03:51 PM
Last Post: Pete
  Reading a single value from Registry (Win) euklides 1 674 06-06-2022, 02:41 PM
Last Post: euklides

Forum Jump:


Users browsing this thread: 1 Guest(s)