home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.os.linux.advocacy      Torvalds farts & fans know what he ate      164,974 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 163,941 of 164,974   
   Lawrence =?iso-8859-13?q?D=FFOlivei to DFS   
   Re: Get to know your files and folders!   
   27 Jan 26 21:02:55   
   
   XPost: comp.lang.python   
   From: ldo@nz.invalid   
      
   On Mon, 26 Jan 2026 13:28:24 -0500, DFS wrote:   
      
   > Here's some Python code I wrote to capture file metadata (name,   
   > location, date created, date modified, and size) in a SQLite   
   > database.   
      
   I would consider this a waste of time. There are already standard *nix   
   commands (e.g. du(1) ) for   
   obtaining this information directly from the filesystem, without the   
   extra steps of collecting the info in a database and having to keep   
   that up to date.   
      
   > Tested on Windows and Linux/WSL.   
      
   But not on native Linux? Because WSL forces the Linux kernel to go   
   through the filesystem-handling bottleneck that is the Windows kernel.   
      
   Just some thoughts:   
      
       cSQL =  " CREATE TABLE Files "   
       cSQL += " ( "   
       cSQL += "   FileID       INTEGER NOT NULL PRIMARY KEY, "   
       cSQL += "   FolderID     INTEGER REFERENCES Folders (FolderID), "   
       cSQL += "   Folder       TEXT    NOT NULL, "   
       cSQL += "   FileName     TEXT    NOT NULL, "   
       cSQL += "   FileCreated  NUMBER  NOT NULL, "   
       cSQL += "   FileModified NUMBER  NOT NULL, "   
       cSQL += "   FileSizeKB   NUMBER  NOT NULL "   
       cSQL += " );"   
      
   Did you know Python does implicit string concatenation, like C and   
   C++?   
      
   Also, I notice you are assuming each file has only one parent folder.   
   You do know *nix systems are not restricted like this, right?   
      
       filesize   = round(os.path.getsize(root + '/' + file)/1000,1)   
       filecreate = os.path.getctime(root + '/' + file)   
       filecreate = str(datetime.datetime.fromtimestamp(filecreate))[0:19]   
       filemod    = os.path.getmtime(root + '/' + file)   
      
   How many different file-info lookups do you need to do on each file?   
   How do you handle symlinks? (Yes, even Windows has those now.)   
      
   The usual way to get this info is with os.lstat()   
   , which returns it   
   all with a single OS call.   
      
   > The major slowdown is one cartesian/update query - used to summarize   
   > data in all subdirectories - for which I haven't been able to figure   
   > out a decent workaround.   
      
   As I said, your problem is using a DBMS in the first place. You are   
   doing a cross-join of *all* files against *all* folders. But in the   
   real filesystem, it would be unheard of for *all* files to be present   
   in *all* folders -- or indeed, for many files to be present in more   
   than one folder.   
      
   Also, I notice your database structure does not reflect the folder   
   hierarchy -- where do you record parent-child relationships between   
   folders?   
      
   In short, take more account of the actual filesystem hierarchy in your   
   database structure.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca