Forums before death by AOL, social media and spammers... "We can't have nice things"
|    alt.os.linux.mint    |    Looks pretty on the outside, thats it!    |    30,566 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 29,860 of 30,566    |
|    Paul to Mike Scott    |
|    Re: protect against bit-rot?    |
|    04 Dec 25 16:38:35    |
   
   From: nospam@needed.invalid   
      
   On Thu, 12/4/2025 11:43 AM, Mike Scott wrote:   
   > On 04/12/2025 14:09, Paul wrote:   
   >> On Thu, 12/4/2025 6:42 AM, Mike Scott wrote:   
   >>> Hmmm.   
   >>>   
   >>> Checking over things, I found some old files with dates in the future. One   
   directory lists as:   
   >>>   
   >>> CD> ls -li   
   >>> total 36   
   >>> 3671727 -rw-rw-r-- 1 mike mike 1230 Oct 16 2018 cd1.k3b   
   >>> 3671728 -rw-rw-r-- 1 mike mike 1160 Oct 16 2018 cd1.k3b.files   
   >>> 3671729 -rw-r--r-- 1 mike mike 137 Oct 16 2018 k3b2list.sh   
   >>> 3671730 -rw-r--r-- 1 mike mike 17873 Feb 7 2106 maindata.xml   
   >>> 3671731 -rw-r--r-- 1 mike mike 17 Feb 7 2106 mimetype   
   >>>   
   >>>   
   >>> Note the future dates for the last two. This stuff has been left around   
   since 2018, unmodified (by me, at least). The contents look reasonable, so   
   it's just the metadata messed up - they seem to be the only two files affected.   
   >>>   
   >>> The data was on a freebsd machine until a few weeks ago, when the whole   
   lot was rsync'd from spinning rust to the present SSD on a mint server.   
   >>>   
   >>> A bit worrying: freebsd failure? rsync failure? SSD failure? linux   
   failure? Gremlins?   
   >>>   
   >>> But how can anyone possibly realistically detect this sort of thing? With   
   an unknown cause?   
   >>>   
   >>>   
   >>   
   >> https://github.com/antrea-io/antrea/issues/1417   
   >>   
   >> "https://tools.ietf.org/html/rfc7011#section-6.1.7 does state:   
   >>   
   >> The dateTimeSeconds data type is an unsigned 32-bit integer in   
   >> network byte order containing the number of seconds since the UNIX   
   >> epoch, 1 January 1970 at 00:00 UTC, as defined in [POSIX.1].   
   >> dateTimeSeconds is encoded identically to the IPFIX Message Header   
   >> Export Time field. It can represent dates between 1 January 1970   
   and   
   >> 7 February 2106 without wraparound; see Section 5.2 for wraparound   
   considerations."   
   >> ^^^^^^^^^^^^^^^   
   >>   
   >> That appears to be a magic number and not a random corruption.   
   >> That would be a software problem of some sort.   
   >   
   > Thanks for pointing that out; I'd missed the significance.   
   >   
   > Nevertheless, /something/ changed the dates somewhen - they should all be   
   similar.   
   > I've checked old dump files around from freebsd, but restore (on mint) sets   
   the   
   > extracted file dates to the current time, which isn't helpful (and wrong   
   behaviour?!)   
   >   
   > My random guess would be rsync. But that's just because the wind's in the   
   south-west :-}   
      
   OK, here is my theory. Rather than name and shame a tool, I look at it this   
   way.   
      
   On a file system such as NTFS, it has 64-bit timestamps, and with a certain   
   resolution   
   choice, it can represent a huge set of dates. More dates than a UNIX epoch. On   
   Linux,   
   the NTFS metadata is used to populate what stat() uses, just so there is   
   "something to munch on". There is an opportunity for an epoch-mismatch, just at   
   the "stuffing of stat()" level.   
      
   Another part of what you've done, has 32-bit timestamps (you didn't make that   
   choice, someone else did). OK, we can drop the 100ns crap no problem. A   
   timestamp   
   to the nearest second is plenty to not annoy anyone.   
      
   But the year range, the "epoch" on the 32 bit representation is strictly   
   limited.   
      
   Let us say the year 9999 appears on NTFS, the "plumbing" supports at   
   most 2106, then what value do you send ? Spock-like logic says we   
   jam it to 2106 :-) Personally, I like using 1970, because people recognize   
   that (as a flag) of a Time Lord snafu. Whereas I had to do a Google to figure   
   out I was hitting a limit-flag via 2106 on a smaller section of plumbing.   
      
   Sometimes, it's just a certain subset of file storage methods that foul up.   
   Like a file with a "short file name", somehow the date gets trashed while   
   the metadata on that is re-inflated.   
      
   I've run into interesting limit cases before. At one time, I used two OSes   
   that supported different file name length. I would take a ZIP over to the   
   one with the shorter limitation, unzip the folder and... there would be   
   one file, which had a name one character longer than the system could handle.   
   The extra letter was silently discarded. There was no limit-flag behavior   
   on that one, and every time I un-archived something, I had to keep an   
   eye peeled for damaged goods.   
      
   Choices for correction, are to "detect all weird situations", which is   
   overly ambitious. A second option, is to transfer date information   
   from source to destination, and use gnu_touch to set the date on   
   the file, using the manually transported metadata.   
      
   I would like to think that turning on a Verify function in the plumbing   
   would catch it. But that would only work if the Verify function was end   
   to end, which is unlikely to be the case.   
      
   *******   
      
   To start, take the tree from the source end, and list it and sort by date.   
   This may cause the damaged goods to appear at one end of the listing file.   
      
   Using my Notes file, I have examples of "old" attempts to catalog some goods.   
      
    find /media/WIN2KAS -type d -exec ls -al -1 -d {} + > directories.txt   
    find /media/WIN2KAS -type f -exec ls -al -1 {} + > filelist.txt   
      
    sudo find /media/WIN2KAS -type f -exec stat --printf='%010Y %y %n\n' {} + >   
   statlist.txt   
      
   On Windows, the GNUWIN32 packages, contain ports of the usual suspects,   
   including find and sort.   
      
    find.exe C:\Downloads -type f -exec ls -al -1 {} ; | sort.exe -t: -k3 >   
   SortedList.txt   
      
   I don't usually write extensive notes for the one-liners, leaving   
   that for analysis later. In any case, you can craft some one-liners   
   to make metadata verification at the two ends a bit easier.   
      
    Paul   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca