How it Works

How it Works

Summary
-------------

In an ideal world, where transmission speeds and storage space
were infinite, each computing device would maintain a copy of all
of the data on the planet, and each time two computers connected,
they would ensure that each were up to date. This system brings
that concept into the real world.

This system maintains a running library, like an archive, that is to
be preserved, in part or in total, by as few or as many sites as are
willing. The client and server becomes one in this system. When
two computers connect, they are to exchange data such that each one
ends up with a complete, updated copy of the library. Those that
cannot maintain full libaries will focus on newer, more local data,
such that old data or data from far off locations will be the first to
be purged as time goes by. Yet old data arrives first, and is output
first upon a datafile index request (with a sort optimized for what a node
considers to be the "ideal" message size) thus a FIFO should occur
for all data prior to purging, regardless of purging frequency
(within reason). This means that the newest messages will be prone to
transferring at some time other than real time. Clients can choose
to request the contents a datafile in any order it chooses, and it can
expect that the default sort was as aforementioned, thus it can traverse
the dataset in any manner that it decides. Clients and servers can
interact to determine how each sorts and operates upon data.

Although each node is simultaneously client and server as far as data
storage is concerned, this text shall use the term "server" to refer
to the computer that takes the incoming connection and the term
"client" to refer to the computer that makes the outgoing connection.

Reliable, centralized repositories can be determined by public
opinion, though centralized repositories are unnecessary and only
particularly useful for long distance connections such as through the
internet. This is intended as an open system, though privacy is
available via encryption. One Time Pad is recommended for that
purpose, though reasonably strong encryption such as PGP should be
suitable for most users.

1. This system is to be text oriented, in the manner of a mail server.
People can choose to send binary data, if they wish, by using
UUENCODE-type methods. Still, as the system's primary intent is
failsafe communications, and as the data is to live on a great
many devices, it is likely that many servers will scrub binary data
or prune transmissions that are too long.

2. This system is simply based upon an index file and text data files.
It can be implemented in a number of ways, though the core methods and
core software are expressly intended for roaming wifi distribution of
data without webservers. It is to be understood that the text data
files are simply a part of the external interface, and that computer
science will be utilized to ensure high quality performance.

3. Webservers are to be able to distribute this data with ease such that
a website can serve the data under a directory, and the client system
can use a script with lynx or another custom lightweight browser to access
the directory.

(The author's core software will focus on the raw connection methods.)

4. Each data file is named by the "abbreviated location" of its data
entries along with the creation date that each of its entries bears. The
date that an entry is received is not necessarily the date that the entry
was originally created. The original date of the entry's creation is stored
along with the other entry inside of its data file. It is up to a node to
decide how many and which abbreviated locations it will store data files
for and it may choose to combine a large number of minor locations into
a smaller number of files.

5. The index file has two shasums for each data file so that one
can know if it is up to date with what it finds on the other computer
that it connects to. One shasum is for everything, including all of
the headers that may be attached by various versions and
implementatons. The other is just for the CORE data.
The most common file comparison will result in identical files.
These hashes will quickly complete the comparison operations.

6. Generally accepted options for the "abbreviated location" will
be decided by the users who reside in various regions, and the
author recommends whatever the local users consider to be their
with their state or region. Such as OHIO or SCANDINAVIA. The
abbreviated location is always forced to all caps and does not have
any spaces. Spaces that appear should be pruned to ensure
compatibility with more sensitive systems.

7. Regular data entries contain their hash, abbreviated location, UTC
date/time stamp of entry creation, sender, recipient, keywords,
detailed location, and text. The recommended detailed location is:
city/town,county,state,country

8. Header entries contain their hash, abbreviated location, description,
and custom fields.

9. Data and headers that appear without the descriptors should be
hashed and possibly given descriptors by a receiving node.

10. Simple ZIP compression is to be offered (and utilized internally)
by nodes.

11. Nodes will generally use a SEMAPHORE/MUTEX to do any work
upon the data files.

12. New entries that arrive do not overwrite what is already in the
data file. Entries that are already present by shasum are not added.
Any entries that are received are to be rehashed to prevent
corruption. Nodes should output and reckon their data sorted from old to new by
date/time stamp for the sake of file hash conformity as well as for ease
of querying. The actual storage must likely be FIFO for the sake of disk
access rates.

Return to the main page.

Hermann L. Johnson. January 2019. Free for unmodified redistribution and non-commercial use.