Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Seeking Matches in a Data Base
#1
I have a data base of integers which run between 1 and 100. These integers are organized in a Data Base of over 4500 lines with each line of data holding 10 different integers. What I have been trying to do is to load a line of data and compare that line with the following 7 lines of data to find where the integers in the primary line may match with any integer is the subsequent 7 lines. I'm struggling with code to do this, mainly with the concept or outline of a way to do this. For example, I was thinking I may have to copy the Data Base and use one data base to draw the primary line and the other data to draw 7 other lines for comparison. I would then march thru the primary data base one line at a time however the secondary data I would need to use Seek command to reposition the start of the data lines for comparison.

Another concept of doing this was to simply just use Seek command in a Do Loop to march thru the data base. Here is a rough outline of the code for this idea. Assume all variable and arrays have been properly dimensioned.

Code: (Select All)
Open "DataBase" For Input As #1
Do While Not EOF(1)
    For A = 1 To 10: Input #1, aDat(A): Next
    DatLineCount = DatLineCount + 1
    For NextDatLine = 1 To 7
        Select Case NextDatLine
            Case 1
                For C = 1 To 10: Input #1, cDat(C): Next
                For x = 1 To 10
                    For y = 1 To 10
                        If aDat(x) = cDat(y) Then MatchcDat = MatchcDat + 1
                Next y, x
                Exit Select
            Case 2
                For D = 1 To 10: Input #1, dDat(D): Next
                For x = 1 To 10
                    For y = 1 To 10
                        If aDat(x) = dDat(y) Then MatchdDat = MatchdDat + 1
                Next y, x
                Exit Select
            Case 3
                For E = 1 To 10: Input #1, eDat(E): Next
                For x = 1 To 10
                    For y = 1 To 10
                        If aDat(x) = eDat(y) Then MatcheDat = MatcheDat + 1
                Next y, x
                Exit Select
            Case 4
                For F = 1 To 10: Input #1, fDat(F): Next
                For x = 1 To 10
                    For y = 1 To 10
                        If aDat(x) = fDat(y) Then MatchfDat = MatchfDat + 1
                Next y, x
                Exit Select
            Case 5
                For G = 1 To 10: Input #1, gDat(G): Next
                For x = 1 To 10
                    For y = 1 To 10
                        If gDat(x) = gDat(y) Then MatchgDat = MatchgDat + 1
                Next y, x
                Exit Select
            Case 6
                For H = 1 To 10: Input #1, hDat(H): Next
                For x = 1 To 10
                    For y = 1 To 10
                        If aDat(x) = hDat(y) Then MatchhDat = MatchhDat + 1
                Next y, x
                Exit Select
            Case 3
                For i = 1 To 10: Input #1, iDat(i): Next
                For x = 1 To 10
                    For y = 1 To 10
                        If aDat(x) = iDat(y) Then MatchiDat = MatchiDat + 1
                Next y, x
                Exit Select
        End Select
    Next NextDatLine
    Seek #1, DatLineCount
Loop
 
Is there another approach that I should consider? Might there be a way to match items in a data base just using Logical Operator?? I'm going to assume any matching algorythm will be time consuming so accuracy would be more important than speed.
Reply
#2
Read the whole file once inside a two dimensional array.
Code: (Select All)
FOR i = 1 to 4500 'number of lines in the file... replace with a DO LOOP if necessary
   FOR j = 1 to 10
     INPUT #1, data_array(i,j)
NEXT j, i

That's now got all your data read and into your array.  You don't have to worry with the slowness of the file reading any more.  Now you just process it.

Code: (Select All)
FOR i = 1 to 4500 - 7 'going to compare the next 7 lines of data
   FOR j = 1 to 7 'the 7 lines we're going to compare
      FOR k = 1 to 10 'the 10 data elements that we need to compare in the first line of data
          FOR l = 1 to 10 'the 10 data elements that we need to compare
              IF data_array(i,k) = data_array(i+j,l) THEN   'we have a match
          NEXT
    NEXT
  NEXT
NEXT
Reply
#3
The magic of nesting, thanks Steve.
Reply
#4
An even easier way to do this...

These numbers run from 1 to 100.
Make you an array to hold the values and LINES which they appear in.   

Then read each line and if a number appears, add it to it's spot in that array:
Code: (Select All)
DIM data_array(1 to 100, 1 to total_lines)
FOR i = 1 to 4500 'as you said 4500 entries
   FOR j = 1 to 10
      INPUT #1, the_data
      data_array(the_data, i) = _TRUE
NEXT i, j

Now you know which lines the data falls on.  Just do a quick check to see if there's any extra _TRUE events above that line by however many spots you want.
Code: (Select All)
FOR i = 1 to 4500 'for the 4500 lines
    FOR k = 1 to 100 'to check the 100 numbers    
        IF data_array(i, k) = _FALSE THEN _CONTINUE 'no need to check the next 7 values as this number doesn't exist on this line
        FOR j = i +1 to i + 7 'to check the next 7 lines
            IF j > 4500 THEN EXIT FOR 'you're above the data limit.  Skip checking.
            IF data_array(j, k) = _TRUE THEN 'it's a match, do whatever you want with this match.   The value is k, it's on both lines i and j
        NEXT
    NEXT
NEXT

Less nesting in there, so probably more efficient overall.

Also seems a little simpler logic to me, so easier to maintain and debug in the future, if necessary.  See if it doesn't make more sense to you as well -- I tried to comment everything there to showlight my thinking, but without any actual data and whatnot, it's just pseudocode and not tested.  It *looks* good to me, but there's always the chance I've goofed something simple up here.  

I always reserve the right to fail.  Big Grin
Reply
#5
I'm still working on implementing your latest suggestion using _TRUE and _FALSE. I have never factored these constants in my past algorythms. Unfortunately they are giving me some angst. Seems I'm getting an Out of Range error for assigning a value of _TRUE to the data_array array. I think it is because we have dimensioned it with positive integers whereas _TRUE would be -1. So, it would appear I should be dimensioning data_array as (-1 to 100, 1 to total_lines) but this would bring in an index of zero which I suppose would then be _FALSE or would it simply be a "0" index. I will often use a "0" index to store other info like the sum total of 10 integers on one line of the data. Is there a way to keep the _TRUE constant without triggering the Out of Range error??
Reply
#6
So I was all out to lunch on that suspected cause of _TRUE being a -1 which was not within the range of (1 to 100). It took me awhile to realize the problem was in the name of the array ...data_array(?,?). For some reason that array would not accept _TRUE. As soon as I changed the name of the array to dataarray, removing the underscore, things when well ... MILLER TIME  again...
Reply
#7
Can you share an example of the glitch?  That sounds like something we should investigate on the dev side of things.
Reply
#8
Sorry Steve, I haven't been able to re-create the error solution by removing the underscore in the Array name. Here is the code that I have been playing with, well almost the code, the only difference is the Read Data approach as my actual code is opening a the file with the approx 4500 lines of data. 

Originally when I wrote my hallelujah comment above the first instance of dataarray() was accepted but had an out of range error on the total_lines, which I hadn't changed to "totallines" so it appeared the run was no longer erroring out on dataarray(). 

Anyway, here's the code snippet where I'm getting an Out of Range error on data_array()

Code: (Select All)
Dim Shared data_array(1 To 100, 1 To total_lines)
'Dim Shared dataarray(1 To 100, 1 To totallines)
Dim Shared Repeat(1 To 100, 1 To total_lines)
Dim Shared LinesTotal
Dim Shared Num

Do While Num <> 101
    Read Num
    numcount = numcount + 1
    If numcount = 10 Then
        LinesTotal = LinesTotal + 1
        numcount = 0
    End If
Loop
total_lines = LinesTotal

Restore

For i = 1 To total_lines
    For j = 1 To 10
        If Num(j) = 101 Then Exit For
        Read Num(j)
        data_array(Num(j), i) = _TRUE
        'dataarray(Num(j), i) = _TRUE
    Next j
Next i

Restore

For i = 1 To total_lines
    For k = 1 To 100 'to check the 100 numbers
        If dataarray(i, k) = _FALSE Then _Continue 'no need to check the next 7 values as this number doesn't exist on this line
        For j = i + 1 To i + 7 'to check the next 7 lines
            If j > total_lines Then Exit For 'you're above the data limit.  Skip checking.
            If data_array(j, k) = _TRUE Then 'it's a match, do whatever you want with this match.   The value is k, it's on both lines i and j

                Repeat(i, k) = Repeat(i, k) + 1
            End If
        Next j
    Next k
Next i



For TL = 1 To total_lines
    Print Repeat(TL, 14);
Next TL

Data 1,15,31,25,82,12,63,64,50,47
Data 66,22,18,93,84,88,98,10,6,27
Data 19,45,46,11,12,57,71,93,100,3
Data 24,33,96,56,17,5,87,4,5,99
Data 94,17,77,89,20,76,90,8,40,49
Data 6,43,53,14,11,80,92,100,9,3
Data 84,77,73,57,28,8,15,65,69,2
Data 70,29,21,100,71,36,44,16,97,69
Data 19,83,88,23,97,80,28,5,83,84
Data 100,4,39,35,86,12,13,68,64,74
Data 101
Reply
#9
Look at your remarked out DIM statement.  totallines =/= total_lines
Reply
#10
I agree..totallines is not the same variable as total_lines.

The commented out line of 
Code: (Select All)
'Dim Shared dataarray(1 To 100, 1 To totallines)
intended to remove the underscore from both the array "dataarray()" and the variable "totallines". The intent was to use the underscored array and variable so that the routine was as close to the original code provided which was triggering an Out of Range error.

The "For i = 1 to total_lines should have worked??? because it was the correct variable to use in the assignment of ...data_array(Num(j),i) = _TRUE... and "Num(j) should have fallen with the range of 1 to 100, while i should have fallen within the range of 1 to total_lines. Why that generated an Out of Range error, I don't know.
What I'm shooting for is a comparison and counting. So as an example using the DATA lines, I'm taking the 1st Data line of 1,15,31,25,82.12,63,64,50,47 and compare all those numbers with the next 7 data line (ie Data lines 2 to 8), and if there is a match then I want to count how many times that number appeared in the following 7 lines of data. The number 1 does not appear in any of the following 7 lines so Repeat(1,1 to 7) will all carry a zero value, but if you take the number 12 it does repeat in the 2nd comparison line (data line 3) 
Repeat (12, 2)= Repeat (12,2) +1

So once all of Data line 1 has been compared to the next 7 lines of data the routine needs to use Data Line 2 which is 66,22,18,93,84,88,98,10,6,27, and perform the same comparison and count with the next 7 lines (ie Data Lines 3 to 9). The routine should march one Data line at a time comparing and counting the Repeats in the following 7 lines until the 7th line is the end of the data base.

I haven't yet worked with _TRUE and _FALSE and so not sure why code ...data_array(Num(j),i) = _TRUE is Out of Range, where code line  ...data_array(Num(j),i) = data_array(Num(j),i) +1
May not trigger the same Error ... now I said that and I haven't actually tried it, but hopefully you know what I mean.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Must an extra value be provided on DATA statement? dakra137 11 1,062 09-30-2025, 05:38 PM
Last Post: ahenry3068
  Why is 'base' treated like a reserved word in the IDE? Pete 4 807 04-11-2025, 09:49 PM
Last Post: SMcNeill
  Change file data krovit 5 1,213 07-11-2024, 07:45 AM
Last Post: krovit
  Arrays In User Data Types Consolemu 2 878 01-18-2024, 09:49 PM
Last Post: bplus
  Write data to EXE file Steffan-68 8 2,025 05-15-2023, 06:41 PM
Last Post: Steffan-68

Forum Jump:


Users browsing this thread: 1 Guest(s)