04-13-2025, 09:49 PM
Here is a program to extract a web page and find a string. But sometimes it doesn't work. Check it out:
I tried different things to figure out what I did wrong, because this was the result:
Number of lines 7
found at pos 9182 on line 7
All size 82640 found at 36715
K size 10
K$='Next page '
Extract follows -200 to +200
Len= 46126
/div></fieldset></form></div><div class="mw-prefixindex-nav"><a href="/w/index.php?title=Special
refixIndex&from=BSicon_geBHF-L.svg&prefix=BSicon&namespace=6" title="Special
refixIndex">Next page (BSicon geBHF-L.svg)</a></div><div class="mw-prefixindex-body"><ul class="mw-prefixindex-list"><li><a href="/wiki/File:BSicon_gPSLr.svg" title="File:BSicon gPSLr.svg">BSicon gPSLr.svg</a></li>
<li><a href="/wiki/File:BSicon_gS%2BBHF.svg" title="File:BSicon gS+BHF.svg">BSicon gS+BHF.svg</a></li>
<li><a href="/wiki/File:BSicon_gSBHF.svg" title="File:BSicon gSBHF.svg">BSicon gSBHF.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI1%2Br.svg" title="File:BSicon gSHI1+r.svg">BSicon gSHI1+r.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI1l.svg" title="File:BSicon gSHI1l.svg">BSicon gSHI1l.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2%2Bl.svg" title="File:BSicon gSHI2+l.svg">BSicon gSHI2+l.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2%2Br.svg" title="File:BSicon gSHI2+r.svg">BSicon gSHI2+r.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2c1.svg" title="File:BSicon gSHI2c1.svg">BSicon gSHI2c1.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2c2.svg" title="File:BSicon gSHI2c2.svg">BSicon gSHI2c2.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2c3.svg" title="File:BSicon gSHI2c3.svg">BSicon gSHI2c3.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2c4.svg" title="File:BSicon gSHI2c4.svg">BSicon gSHI2c4.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2l.svg" title="File:BSicon gSHI2l.svg">BSicon gSHI2l.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2r.svg" title="File:BSicon gSHI2r.svg">BSicon gSHI2r.svg</a></li>
I'll spare you the next 30,000 chars, but you get the idea. It does correctly copy the preceding 200 bytes, but fails to stop at 400. I note, when I check the first line of output in Notepad++ the search string beginning "Next Page" is correctly at byte 201:
/div></fieldset></form></div><div class="mw-prefixindex-nav"><a href="/w/index.php?title=Special
refixIndex&from=BSicon_geBHF-L.svg&prefix=BSicon&namespace=6" title="Special
refixIndex">Next page (BSicon
So any assistance you can offer me is appreciated.
- - - -
Paul
Code: (Select All)
$Unstable:Http
$Console:Only
Dim Url As String
ReDim Shared As String Res(0), All
Dim h As Long, s As String, F As Long
Const Site = "//commons.wikimedia.org"
Url = "/w/index.php?title=Special
refixIndex&from=BSicon_gPSLr.svg&prefix=BSicon&namespace=6"
Dim R As String
Dim As Long Target, Count, Location
10 '
GoSub DownLoad 'Site + Url, sc&, Count&
If SC <> 0 Then 99
Print "Number of lines"; Count
Print "found at pos"; Location; "on line"; Target
Print "All size"; Len(All); " found at";
F = InStr(All, "Next page (BSicon")
Print F
K$ = Mid$(All, F, 10)
Print "K size"; Len(K$)
Print "K$='"; K$; "'"
Print "Extract follows -200 to +200"
Extract$ = Mid$(All, F - 200.40)
Print "Len="; Len(Extract$)
Print
Print Extract$
End
99 '
Print
Print "Status code"; SC; " on: "; Url
Print
Print "Do you want to continue";
Input Ans$
If UCase$(Left$(Ans$, 1)) = "Y" Then GoTo 10
If UCase$(Left$(Ans$, 1)) <> "N" Then GoTo 99
Print
End
DownLoad:
Count = 0
ReDim Res(Count)
All = ""
h = _OpenClient("HTTP:" + Site + Url)
statusCode = _StatusCode(h)
While Not EOF(h)
_Limit 60
Get #h, , s
All = All + s
Count = Count + 1
ReDim _Preserve Res(Count)
F = InStr(s, "Next page (BSicon")
If F <> 0 Then
Target = Count
Location = F
End If
Res(Count) = s
Wend
Close #h
Return
' Content of the HTTP response is returned.
' The statusCode is also assigned.
Function DL$ (url As String, statusCode As Long)
Dim h As Long, content As String, s As String
h = _OpenClient("HTTP:" + url)
statusCode = _StatusCode(h)
While Not EOF(h)
_Limit 60
Get #h, , s
content = content + s
Wend
Close #h
DL$ = content
End Function
Sub xDownLoad (url As String, statusCode As Long, Count As Long)
Dim h As Long, content As String, s As String
h = _OpenClient("HTTP:" + url)
statusCode = _StatusCode(h)
While Not EOF(h)
_Limit 60
Get #h, , s
content = content + s
Wend
Close #h
End Sub
I tried different things to figure out what I did wrong, because this was the result:
Number of lines 7
found at pos 9182 on line 7
All size 82640 found at 36715
K size 10
K$='Next page '
Extract follows -200 to +200
Len= 46126
/div></fieldset></form></div><div class="mw-prefixindex-nav"><a href="/w/index.php?title=Special
refixIndex&from=BSicon_geBHF-L.svg&prefix=BSicon&namespace=6" title="Special
refixIndex">Next page (BSicon geBHF-L.svg)</a></div><div class="mw-prefixindex-body"><ul class="mw-prefixindex-list"><li><a href="/wiki/File:BSicon_gPSLr.svg" title="File:BSicon gPSLr.svg">BSicon gPSLr.svg</a></li><li><a href="/wiki/File:BSicon_gS%2BBHF.svg" title="File:BSicon gS+BHF.svg">BSicon gS+BHF.svg</a></li>
<li><a href="/wiki/File:BSicon_gSBHF.svg" title="File:BSicon gSBHF.svg">BSicon gSBHF.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI1%2Br.svg" title="File:BSicon gSHI1+r.svg">BSicon gSHI1+r.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI1l.svg" title="File:BSicon gSHI1l.svg">BSicon gSHI1l.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2%2Bl.svg" title="File:BSicon gSHI2+l.svg">BSicon gSHI2+l.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2%2Br.svg" title="File:BSicon gSHI2+r.svg">BSicon gSHI2+r.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2c1.svg" title="File:BSicon gSHI2c1.svg">BSicon gSHI2c1.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2c2.svg" title="File:BSicon gSHI2c2.svg">BSicon gSHI2c2.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2c3.svg" title="File:BSicon gSHI2c3.svg">BSicon gSHI2c3.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2c4.svg" title="File:BSicon gSHI2c4.svg">BSicon gSHI2c4.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2l.svg" title="File:BSicon gSHI2l.svg">BSicon gSHI2l.svg</a></li>
<li><a href="/wiki/File:BSicon_gSHI2r.svg" title="File:BSicon gSHI2r.svg">BSicon gSHI2r.svg</a></li>
I'll spare you the next 30,000 chars, but you get the idea. It does correctly copy the preceding 200 bytes, but fails to stop at 400. I note, when I check the first line of output in Notepad++ the search string beginning "Next Page" is correctly at byte 201:
/div></fieldset></form></div><div class="mw-prefixindex-nav"><a href="/w/index.php?title=Special
refixIndex&from=BSicon_geBHF-L.svg&prefix=BSicon&namespace=6" title="Special
refixIndex">Next page (BSicon So any assistance you can offer me is appreciated.
- - - -
Paul
While 1
Fix Bugs
report all bugs fixed
receive bug report
end while
Fix Bugs
report all bugs fixed
receive bug report
end while


