05-31-2025, 10:58 AM
I have interests in some files stored on the image repository for Wikipedia (and related projects), Wikimedia commons. So, saving a category listing HTML file, I wrote a program to extract the URLs of the files listed on the page. The first program, scan_for_wikimedia_links.bas, (copy attached to this message) collects all links on a page:
The second program, scan_for_wikimedia_svg_links.bas, (copy attached to this message) is a slight improvement, in that it only collects links to .svg image files:
Don't know if these programs are useful to others, but they are an effort to show Basic (and Quickbasic/QB64) is still a relevant language for writing programs to get actual work done.
Code: (Select All)
'
' This program scans a saved Wikimedia gallery listing (HTML file)
' for image links
' Paul Robinson 2025-05-31
$Console:Only
Option _Explicit
Dim As String TheRecord ' Line being read
Dim As String T
Dim As String FName
Dim As String HEXSTR
Dim As Long L ' Number of lines read
Dim As Long Fi ' Number of files found
Dim As Long InBody ' Has <body been seen
Dim As Long E ' End of name
Dim As Long S ' First item search
Dim As Long S2 ' Second item
Dim As Long I
Const Quote = Chr$(34) ' " character
Const ItemFlag = "<div class=" + Quote + "thumb" + Quote ' Look for: <div class="thumb"
Const NameFlag = "<a href=" + Quote + "https://"
Const FALSE = 0
Const TRUE = Not FALSE
ChDir "L:\Assets\Road Signs"
Open "Category_SVG road signs in the United States - Wikimedia Commons.htm" For Input As #1
Open "roadsigns.lst" For Output As #2
InBody = FALSE 'Not found yet
While Not EOF(1)
Line Input #1, TheRecord
L = L + 1
If Not InBody Then
I = InStr(LCase$(TheRecord), "<body")
If I Then ' Not found yet
TheRecord = Mid$(TheRecord, I + 10, Len(TheRecord))
Print "<body found at position"; I; "on line "; L
InBody = TRUE
Else
_Continue
End If
End If
S = InStr(TheRecord, ItemFlag)
While S
S2 = InStr(S, LCase$(TheRecord), NameFlag)
If S2 Then '
S2 = S2 + Len(NameFlag) ' Start of URL
E = InStr(S2, TheRecord, Quote)
Print #2, Mid$(TheRecord, S2, E - S2)
Fi = Fi + 1
End If
S = InStr(E, TheRecord, ItemFlag)
Wend
Wend
Print
Print L; "Lines"; Fi; "files."
Close
End
The second program, scan_for_wikimedia_svg_links.bas, (copy attached to this message) is a slight improvement, in that it only collects links to .svg image files:
Code: (Select All)
'
' This program scans a saved Wikimedia categoy listing (HTML file)
' for .SVG image links (value in FileType)
' Paul Robinson 2025-05-31
$Console:Only
Option _Explicit
Dim As String TheRecord ' Line being read
Dim As String Target
Dim As String Check
Dim As Long L ' Number of lines read
Dim As Long Fi ' Number of files found
Dim As Long InBody ' Has <body been seen
Dim As Long E ' End of name
Dim As Long S ' First item search
Dim As Long S2 ' Second item
Dim As Long I
Const Quote = Chr$(34) ' " character
Const ItemFlag = "<div class=" + Quote + "thumb" + Quote ' Look for: <div class="thumb"
Const NameFlag = "<a href=" + Quote + "https://"
Const FileType = ".svg" ' Type of image file to retrieve
Const FALSE = 0
Const TRUE = Not FALSE
ChDir "L:\Assets\Road Signs"
Open "Category_SVG road signs in the United States - Wikimedia Commons.htm" For Input As #1
Open "roadsigns.lst" For Output As #2
InBody = FALSE 'Not found yet
While Not EOF(1)
Line Input #1, TheRecord
L = L + 1
If Not InBody Then
I = InStr(LCase$(TheRecord), "<body")
If I Then ' Not found yet
TheRecord = Mid$(TheRecord, I + 10, Len(TheRecord))
Print "<body found at position"; I; "on line "; L
InBody = TRUE
Else
_Continue
End If
End If
S = InStr(TheRecord, ItemFlag)
While S
S2 = InStr(S, LCase$(TheRecord), NameFlag)
If S2 Then '
S2 = S2 + Len(NameFlag) ' Start of URL
E = InStr(S2, TheRecord, Quote)
Target = Mid$(TheRecord, S2, E - S2)
Check = Right$(LCase$(Target), Len(FileType))
If Check = FileType Then
Fi = Fi + 1
Print #2, Target
End If
End If
S = InStr(E, TheRecord, ItemFlag)
Wend
Wend
Print
Print L; "Lines"; Fi; "files."
Close
End
Don't know if these programs are useful to others, but they are an effort to show Basic (and Quickbasic/QB64) is still a relevant language for writing programs to get actual work done.
While 1
Fix Bugs
report all bugs fixed
receive bug report
end while
Fix Bugs
report all bugs fixed
receive bug report
end while