home bbs files messages ]

Just a sample of the Echomail archive

<< oldest | < older | list | newer > | newest >> ]

 Message 264 
 mark lewis to Tommi Koivula 
 ALLFIX dupes? 
 14 Jun 15 10:42:20 
 
14 Jun 15 06:41, you wrote to Nicholas Boel:

 NB>> Does HPT default to dupechecking by MsgID and some kind of hash check?
 NB>> I don't have anything specific specified, so it's using the default
 NB>> (which I thought was MsgIDwithHashCheck or something in those lines -
 NB>> I didn't look it up so I'm probably on the right track but it's
 NB>> probably the wrong config option). Then in my area definitions, I use:

 NB>> -dupecheck move -dupehistory 365 -tooold 365 -sbkeepall

 TK> It is almost the same here, only "-TooOld 365" missing from my conf.

i don't use -tooold but i do have dupehistory set to 1100 to cover a full
three years of dupes as per the FTS specs...

                        "[...] The serial number may be any eight
     character hexadecimal number,  as long as it is unique - no two
     messages from a given system may have the same serial number
     within a three years. [...]"

365.25 * 3 = 1,095.75 so i pick up an extra 4.25 days :shrug: ;)

 TK> "EchoAreaDefaults -SBkeepAll -dupeCheck move -dupeHistory 31 -b JAM"

i also keep three years of messages just because i can... here's the defaults
line for one of my feed's...

areafixAutoCreateDefaults -d "Automatically added area" -b jam -a 1:3634/12.73
-g Z -p 1100 -dupeCheck move -dupeHistory 1100 -sbkeepall

 NB>> Is there something I may be missing?

 TK> I'm not that famimiar with hpt, so I don't know about its dupe detection
 TK> mechanism.

i found this in some old docs...

[quote]
DupeBaseType

Syntax:
    dupeBaseType 
Example:
    dupeBaseType HashDupesWMsgId

TextDupes
    stores from, to, subj & msgid as text lines.
HashDupes
    stores src32 of from + to + subj + msgid.
HashDupesWMsgId
    same as HashDupes, but stores also msgid as text.
CommonDupeBase
    stores hashes of from + to + subj + areatag + msgid in one file
(hpt_base.dpa)

Default is HashDupesWMsgId.

This statement cannot be repeated.
[/quote]

which seems accurate when looking at this from huskylib/fidoconf/fidoconf.h

[quote]
typedef enum typeDupeCheck {
                    hashDupes, /*Base bild from crc32*/
              hashDupesWmsgid, /*Base bild from crc32+MSGID*/
                    textDupes, /*Base bild from FromName+ToName+Subj+MSGID*/
               commonDupeBase  /*Common base for all areas bild from crc32*/
} e_typeDupeCheck;
[/quote]

one improvement i can see would possibly be to also use the timestamp in the
calculations... especially for those systems that don't put MSGID but do use
complete time stamps including the seconds... but even that may not be good
enough since some can post numerous messages in one second...

i'm trying to remember the other ways that dupe detection is done... this one
is "good enough" but there are others... one takes the entire message header
plus the following 40 bytes, IIRC... that way it gets most all of the control
lines as well as possibly some of the message body text... another strips the
message body of CR and LF and i think white space and does a crc on that to be
used along with the crc on the header... the message body stops at the tear
line (not inclusive) if one exists or the origin line if there is no tear
line... for netmails, the message body stops at the tear line (not inclusive),
or the origin line if one exists or the first path control line... it has been
a while and i can't find all of my notes... i've dug this out of some old
source code, though...

)\/(ark

... That darned Tom, this is ALL his fault for inventing this beast anyhow.
---
 * Origin:  (1:3634/12.73)

<< oldest | < older | list | newer > | newest >> ]

(c) 1994,  bbs@darkrealms.ca