Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need custom sorting algorithm
#1
Hi all

I m looking for a sorting algorithm, see image.
Thanks in advance for your help.

[Image: IMG1.png] 

the sorting algorithm needs to sort the columns that are most similar like shown in the image above.  The image
was made by hand for illustration purposes.  The attached file is just a small sample but the number of columns 
will always be 112.   The idea is to find around 20 to 25 columns with the highest number of matching *'s 

The output only needs to show column numbers,  Example, if 15 columns are found to have 6 *'s in common then 
those 15 columns will be the output,  "5,12,22,35,43,50... n". 

The file I  am processing  often contains 500 or more lines each with 112 columns.  This seems like it would be easy 
enough but once I started the more complicated it got.  Any info would be greatly appreciated.

R1

Sample file, this is a .txt file which on has 25 lines of data.
https://app.box.com/s/38p52cfs6e6bbcbxsuhver7fxeb5dgcs

RSample 
 appreciate
Reply
#2
Would this little method basically be what you're looking for?

Code: (Select All)
CONST columns = 20, rows = 10

DIM a(1 TO rows) AS STRING
FOR i = 1 TO rows
    FOR j = 1 TO columns
        IF INT(RND * 2) THEN a(i) = a(i) + "*" ELSE a(i) = a(i) + "-"
    NEXT
    PRINT a(i)
NEXT

DIM matches(columns)

FOR i = 1 TO rows
    FOR j = 1 TO columns
        IF MID$(a(i), j, 1) = "*" THEN matches(j) = matches(j) + 1
    NEXT
NEXT

FOR i = 1 TO columns '
    PRINT _TRIM$(STR$(matches(i)));
NEXT

It counts the matches for each column and then shows the total at the bottom.   All you'd have to do at this point is just search for the highest value and then display those that have that number of matches.
Reply
#3
Code: (Select All)
CONST columns = 20, rows = 10

DIM a(1 TO rows) AS STRING
FOR i = 1 TO rows
FOR j = 1 TO columns
IF INT(RND * 2) THEN a(i) = a(i) + "*" ELSE a(i) = a(i) + "-"
NEXT
PRINT a(i)
NEXT

DIM matches(columns)

FOR i = 1 TO rows
FOR j = 1 TO columns
IF MID$(a(i), j, 1) = "*" THEN matches(j) = matches(j) + 1
NEXT
NEXT

FOR i = 1 TO columns '
PRINT _TRIM$(STR$(matches(i)));
NEXT

PRINT
PRINT "SORTED BY MOST MATCHES:"
FOR count = columns TO 1 STEP -1
print_label = 0
FOR j = 1 TO columns
IF matches(j) = count THEN
IF print_label = 0 THEN PRINT count; " MATCHES:";: print_label = -1
PRINT j;
END IF
NEXT
IF print_label THEN PRINT
NEXT

As so. (I decided to go in and finish the little routine here for you.) Wink
Reply
#4
In the Rnd function in the Wiki, the link to the "Marsaglia polar method" is incorrect. Correct: https://en.wikipedia.org/wiki/Marsaglia_polar_method
Reply
#5
(11-12-2023, 06:02 PM)Kernelpanic Wrote: In the Rnd function in the Wiki, the link to the "Marsaglia polar method" is incorrect. Correct: https://en.wikipedia.org/wiki/Marsaglia_polar_method

You should post that in the wiki area, so a nice wiki editor will be certain to see it.  Smile
Reply
#6
(11-12-2023, 06:02 PM)Kernelpanic Wrote: In the Rnd function in the Wiki, the link to the "Marsaglia polar method" is incorrect. Correct: https://en.wikipedia.org/wiki/Marsaglia_polar_method

fixed
Reply
#7
(11-12-2023, 06:05 PM)SMcNeill Wrote:
(11-12-2023, 06:02 PM)Kernelpanic Wrote: In the Rnd function in the Wiki, the link to the "Marsaglia polar method" is incorrect. Correct: https://en.wikipedia.org/wiki/Marsaglia_polar_method
You should post that in the wiki area, so a nice wiki editor will be certain to see it.  Smile
I looked there first, but there was nothing there. Ok, I just checked QB64 Wiki only.  Blush
Reply
#8
(11-12-2023, 06:33 AM)SMcNeill Wrote: Would this little method basically be what you're looking for?
Code: (Select All)
CONST columns = 20, rows = 10
DIM a(1 TO rows) AS STRING
FOR i = 1 TO rows
    FOR j = 1 TO columns
        IF INT(RND * 2) THEN a(i) = a(i) + "*" ELSE a(i) = a(i) + "-"
    NEXT
    PRINT a(i)
NEXT
DIM matches(columns)
FOR i = 1 TO rows
    FOR j = 1 TO columns
        IF MID$(a(i), j, 1) = "*" THEN matches(j) = matches(j) + 1
    NEXT
NEXT
FOR i = 1 TO columns '
    PRINT _TRIM$(STR$(matches(i)));
NEXT
It counts the matches for each column and then shows the total at the bottom.   All you'd have to do at this point is just search for the highest value and then display those that have that number of match


Thanks for the reply.

This is very similar to my initial attempt but I need something more exhaustive. It seems logical that the columns with the most
occurrences would also produce the most horizontal matches but I am not sure that's the case.  It's the horizontal matches that
interest me the most.  The file I linked is a small sample and while the columns will always equal 112 the number of lines can
be several thousand.  I thought about using a brute force method but the number of permutations put the breaks on that idea.
Something like 10^23.  I will say that you code makes mine look like crap so thanks, I will use it as my base and try to build on
it.

Thanks again
R1       
 
Reply
#9
(11-12-2023, 06:43 AM)SMcNeill Wrote:
Code: (Select All)
CONST columns = 20, rows = 10

DIM a(1 TO rows) AS STRING
FOR i = 1 TO rows
    FOR j = 1 TO columns
        IF INT(RND * 2) THEN a(i) = a(i) + "*" ELSE a(i) = a(i) + "-"
    NEXT
    PRINT a(i)
NEXT

DIM matches(columns)

FOR i = 1 TO rows
    FOR j = 1 TO columns
        IF MID$(a(i), j, 1) = "*" THEN matches(j) = matches(j) + 1
    NEXT
NEXT

FOR i = 1 TO columns '
    PRINT _TRIM$(STR$(matches(i)));
NEXT

PRINT
PRINT "SORTED BY MOST MATCHES:"
FOR count = columns TO 1 STEP -1
    print_label = 0
    FOR j = 1 TO columns
        IF matches(j) = count THEN
            IF print_label = 0 THEN PRINT count; " MATCHES:";: print_label = -1
            PRINT j;
        END IF
    NEXT
    IF print_label THEN PRINT
NEXT

As so.  (I decided to go in and finish the little routine here for you.)  Wink

Thanks, this fits the bill, missed it in my earlier post.
Reply
#10
OK, I am basically back to my starting point.  I have placed the top >= 60% columns into an array.
Now I need to find 20 or so of these columns that show together most often.  What I am left with is
to format a string using 20 or so randomly selected values from the array then use brute force 
comparing string to string.  Maybe my grandkids will still be here to see the results. 
 
Cry

My brain is laying down on the job, yesterday I could not remember which to use, roll, role, row.
Getting old is a Bitch.  Still accepting ideas from anyone with a working mind.

R1
Reply




Users browsing this thread: 7 Guest(s)