Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
A program to solve a problem: filter Linux/Unix 'ls' output
#1
This is my first program contribution, it worked so well I thought I'd pass it on.



Here is the backstory about why I wrote this program.



I had a problem. My Windows computer started BSODing during initial boot/startup. Since it was an actual BSOD with frowny face, this means it's a Windows problem, not a hardware one. After several attempts to use any of the recovery methods which would leave my files intact, I came to the conclusion that I am in a pickle. Then, I find the recovery partition no longer works.  Now, I came to the condclusion that I was, quite frankly, screwed. So, I used the time-tested method of attacking a problem: I threw money at it.



I went on Amazon and purchased a refurbished machine. The new (to me) machine is actually better than the one I had. A "Dell OptiPlex 7020 Desktop Computer,Intel Quad Core i7 4790 3.6Ghz, 32GB Ram New 2TB SSD." After I received it, I discovered that while it has 4 cores (as I had expected) it has 8 threads (which i did not.) So the computer is actually better than what I thought I was buying.  With that I purchased something I really needed: a 4 TB ruggedized external hard drive, so I can back up my computers without worrying about someone dropping it. I already have a 6 TB external, but I don't feel good about it being moved around.



The price was terrific: with Windows 10 Professional, it was $265.00. Add the external drive and sales tax, $401. So I set up the new computer, and have it build a Windows recovery SSD on an SD card. Plug the reader into the old computer, reset the BIOS to allow boot from an external drive, and I try again. Nothing works. So, from my new computer, I download a Linux distribution,Xubuntu. Repeat the process and it boots fine, file manager can see the internal drive, and it can even see the 6TB external, but not the 4TB ruggedized (even though the new machine does). So I copied my most recent files from my working directory to the 6 TB.




The Problem


I do have a backup of my huge collection of downloaded open-source software on the 6TB but it's old., from last year. It does not have local changes I made from writing programs. On Windows, I just have Free File Sync scan the work directory and mirror to backup. So that is out. I had, however, downloaded the Linux version of QB64PE, but attempt to install it fails because the Wifi adapter apparently is not recognized; it can't download required packages.

Well, I could just copy the new archive to replace the backup. About 1.3 million files, 100,000+ directories, 411 GB, and will take about 500 hours, So, that's not an answer. So how can I solve this problem? So, it hits me: run an 'ls' directory scan with recursive subdirectory search, piped to a file, then take that file over to my new computer and write a filtering program to run there. I had ls exclude owner and group, and list one  file per line. Output from ls looks like this:



Output:


Code: (Select All)
Paul (From LENOVO)/:
total 39638
drwxrwxrwx 1  163840 Apr 30 17:45 gatekeeper
drwxrwxrwx 1  20480 Feb 21 17:06 MERGER-raw
drwxrwxrwx 1    4096 Feb 21 16:49 cvs2svn
...
-rwxrwxrwx 1  631462 Nov 23  2017 .cardpeek.log
-rwxrwxrwx 1  52475 Aug 27  2017 reasonable-argument.png

Paul (From LENOVO)/gatekeeper:
total 604715
-rwxrwxrwx 1    527259 Apr 30 17:45 Marnie.odt

What can be determined from this is:



  • The current directory is shown followed by a colon.
  • The first letter of a file entry is  'd' for a directory. Ignore these; we get specific directories from the prior item.
  • Size summary starts with "total ".
  • There is a blank line before a new directory.
  • Items from 2023 have a colon in the time field, older files have a year in the field.
  • Entries are separated by one space, with the file name last.

I have one additional problem. Just the listing of files itself is an 88 megabyte text file!


The solution:


Code: (Select All)
' Process ls program output to exclude files before this year


FN$ = "k:\files.list"
outFile$ = "k:\keepfiles.list"

Print
Locate 5, 1
Print Time$
FF& = FreeFile
lc = 0
Total$ = "total "
Open FN$ For Input Access Read As #FF&
OutFile& = FreeFile
Open outFile$ For Output As #OutFile&

While Not EOF(FF&)
    Line Input #FF&, Line$
    Line$ = _Trim$(Line$)
    LineEnd = Len(Line$)

    If Line$ = "" Then GoTo SKIP
    If Left$(Line$, 6) = "total " GoTo SKIP ' avoid summary

    Colon = InStr(Line$, ":")
    If Colon = Len(Line$) Then ' it's a directry being listed


        Curdir$ = _Trim$(Left$(Line$, Colon - 1))
        If Right$(Curdir$, 1) <> "/" Then Curdir$ = Curdir$ + "/"
        Locate 9, 1: Print Space$(240): Locate 9, 1
        Print "Current dir="; Curdir$

        ListDir = ListDir + 1
        GoTo SKIP
    End If

    If Left$(Line$, 1) = "d" Then
        DirCount = DirCount + 1

        GoTo SKIP
    End If


    FileCount = FileCount + 1

    '    First, skip attributes
    SpacePos = InStr(1, Line$, " ")

    'Skip over node count
    SpacePos = InStr(SpacePos + 1, Line$, " ")

    'Skip over file size
    SpacePos = InStr(SpacePos + 1, Line$, " ")

    ' determine if current year
    Colon = InStr(SpacePos, Line$, ":")
    If Colon = 0 Then

        skipFile = skipFile + 1
        '    Print "colon at "; Colon; " skipping "
        '    Print Line$

        GoTo SKIP
    End If
    SpacePos = InStr(Colon + 1, Line$, " ")



    Print #OutFile&, Curdir$ + Mid$(Line$, SpacePos + 1)
    Chosen = Chosen + 1

    SKIP: '
    If FileCount Mod 5000 = 0 Then
        Locate 2, 1
        Print FileCount; "      ";
        Locate 5, 20
        Print Time$
    End If



Wend

Close #FF&
Close #OutFile&
Locate 6, 1
Print Time$
Print "Search directories "; ListDir
Print "Subdirectories "; DirCount
Print FileCount; " Files Found"
Print skipFile; " Files skipped"
Print Chosen; " Files chosen for review"



End



Result? Of more than 900,000 files scanned, I need to copy 53. That's all. The program took 5 minutes, processing an average of about 2500 items per second. A really satisfying conclusion, and should put paid to those who claim Basic, and specifically QuickBasic, is not relevant for solving real-world problems.



Paul
While 1
   Fix Bugs
   report all bugs fixed
   receive bug report
end while
Reply
#2
Hi @TDarcos

Clever program.

Did you know you can actually use find command in linux and either pass the results to xargs or use the -exec method to do something ?

Find all files modified since the the last year (run from root directory of where you are scanning):

Code: (Select All)
find . -mtime +365 -print
You could then simply do: 
Code: (Select All)
find . -mtime +365 -exec cp {} destination/dir/{} \;
For every file created or modified since this year (365 days) execute the command:
Code: (Select All)
cp (foundfile) destination/dir/(foundfile)
The funky syntax for the -exec arg simply replaces {} with the found file, and the \; let's you terminate the -exec command. The backslash is to escape the ; in the shell.

Your program is cool too, not saying to not use it. Also it's lots of fun to solve our own problems!

I just wanted to share this 1 liner since i've used it so much in my life. The linux find command is a life saver.

Check out the docs:
Code: (Select All)
man find
grymmjack (gj!)
GitHubYouTube | Soundcloud | 16colo.rs
Reply




Users browsing this thread: 2 Guest(s)