Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Masakari - the abandoned text viewer
#1
Huh, how time flies, just found some old stuff:

Topic: A skeleton code for Text Scroller via Drag-and-Drop
https://qb64forum.alephc.xyz/index.php?t...#msg130335

Decided to recompile it with QB64PE 3.9.1 and see how my main ThinkPad X270 (32GB RAM, 4TB nvme) running Linux Fedora 40 fares with loading-for-browsing latest English Wikipedia XML dump, 97.1GiB:

Since each line is indexed, it needs 22.8 GiB RAM ~= 1.5 billion lines with 8 bytes OFFSET + 8 bytes LENGTH descriptor per line = 1.5*16= 24,000,000,000/1024/1024/1024), not as its successor TriMasakari which needs ... 0 RAM.

Wanted to see how fast "loads" and searches:

   

- All lines of enwiki indexed in: 420 seconds
- All hitlines with "NyoTengu" found in: 90 seconds

One of useful abilities is to sort all the lines by ... their length, this is done with 'Shift+1' combo.
Never saw an editor (even the legendary UltraEdit) with such a function, it is useful sometimes to view longest line(s) - clustered at the bottom. Regarding this XML dump, the longest line is 2,069,029 bytes long, it took 64 seconds to sort the 1.5 billion lines, pressing 'Enter' over the highlighted line wraps it for scrolling.

'Shift+2' sorts by Offset i.e. it returns to the original content.

Looking back, it features some nifty ideas, hate that it has to step aside and make path for better scrolling, faster searching, better ergonomic, more-more functionality... anyway, for archival purposes here comes revision 8.2+ package wit all sources.

My focus again is on English texts and phrase ripping/searching, hopefully next year will show the unseen English Phrase-Checker which is to be part of TriMasakari...
As a gift to all English language learners-n-aficionados, I collected under one roof a printer-friendly corpus 173 pages strong of all must-have (kinda cheatsheets) basics of English suffix/preposition craft, ONE UNSEEN RESOURCE, my word. These pages are product of more than a decade of my diving into English language. ENFUN!

   

   

Also, I showcased how the digrams/trigrams/tetragrams of a given interview (or book, or a superhuge book collection e.g. 'Library Genesis non-fiction' which I ripped recently, by the way 450GB is only the text and 1TB more are the x-grams where the tetragrams are 18+billion) in order the reader to have a superb index (exhaustive one) and to feed the incoming TriMasakari with this phrase-checking files...


Attached Files
.pdf   DRAGON_SUFFIX_ROSTER_REVISION_9.pdf (Size: 9.26 MB / Downloads: 5)
.7z   MASAKARI_r8.2+.7z (Size: 33.83 MB / Downloads: 17)
"He learns not to learn and reverts to what the masses pass by."
Reply




Users browsing this thread: 1 Guest(s)