| How does the Microsoft Exchange
server work compare to the mail server implemented in Grapevine? What are the advantages of using many distributed servers compare to centralized server? Is it more cost effective to have many "low-end" servers than a much smaller number of "high-end" server? |
| The authors recalled an
antecdote about a severe
case of disk paralysis due to lack of sufficent physical memory for use
in reliability via redundancy mechanisms, and of their subsequent
changes to system configuration as a result. Yet more and more,
we've
been seeing the idea of 'just by it more memory' as the one size fits
all solution to this sort of thing instead of actually digging for the
source. What do you think of this trend? Can it last? The paper recalls many problems that GRAPEVINE ran into, such as users using inboxes as a semipersistent storage mechanism instead of the buffer it is supposed to be, etc that seem somehow antiquated given the increase in available sheer hardware power to computers. Do you forsee environments, however, that could still benefit from their lessons? How so? (i.e. mobile devices with limited capacity/resources, etc.) |
|
The paper presents the lessons from the operation of the distributed and replicated system Grapevine such as effects of scale, configuration decisions, adjusting to load, etc.. It is worth considering these when constructing a distributed system. But, do you think there are any other issues in developing distributed systems that we should consider in the current context?
If we have to
rank the important level of these issues, what is the order that you
will give to them?
|
|
Who can send messages to
distribution lists, and how is that done? (I'm imaging a scenario where
any mail sent to a distribution alias is sent to the whole list, and
two distribution lists sign up for each others service, resulting
emails being ping-ponged back and forth due to the lack of duplication
elimination)
How would we characterize the amount of system administration required for this service? The authors state that they are reluctant to make improvements to the system because of the risk of creating bugs in the installed user-base and the difficulty in understanding the code. How could we mitigate these problems? |
| On page 4, the paper says
"Any server that contains a replica of registry can accept a change to
that registry from a client. That server takes the responsibility for
propagating the change to the other relevant servers. The updates are
propagated by Grapevine msgs". I wonder what happens if one of the
servers who has the replica suddenly goes down when a Grapevine msg is
passing on the way? In order to speed up authentication process, the paper provides an idea that uses cache to store the results of authentication and access control checks. I have two hours about this solution they provided: 1. why they used 12 hours as time-out period? Do they just randomly pick up a number? 2. Does this idea produce very heavy overhead for cache? When they decided to use 12 hours as the cache time-out period, did the concern it with the overhead of cache? |
| Not replicating message
bodies would seem to imply that if a server was down / unavailable that
messages couldn't be read. That seems counter to the "high
availability" desired out of a distributed system. This system was built in the same time frame as LOCUS, yet seemed to "cheap out" on storage space ... or were the demands higher on Grapevine compared to Locus? How do very reliable disk storage systems (Network Appliances for example) change distributed systems (or do they)? Is the distribution merely for reliability and availability in the face of flaky networks? |
|
Would you say that the authors have
achieved the Grapevine's design objective of high reliability? One
could define reliability over the set of users' expectations as well as
over the set of correctness of the services provided, and comment on
Grapevine's reliability over both of these sets?
How do you feel about Grapevine not storing the message bodies? Do you think that authors have enough right to be confident in Grapevine's ability to restore any message? Would you be willing to use Grapevine? Why and/or when? How did you like the paper? |
| The need for centralized
access control was probably a deterrent to the making system wide
changes. Are there scenarios where having a
"replicated" system is desirable, despite the potential drawbacks? Is it possible for corrupted servers to propogate "bad data" to the rest of the system? |
| Many of the issues
discussed on this paper, such as expansion strategy, location of
servers, registry replication policy... are all empirical results which
are solved by heuristic methods with human intervention. Over the
years, has distributed systems gotten more automated in terms of
administration? I have found some striking similarities between GrapeVine and the SMTP email service we use nowadays. Is the similar technique, namely redundent mailbox server distributed across geological regions, used in web-based mail server (such as gmail)? |
| It appears Grapevine uses
some form of global inbox, rather than one index per user (since
mailing list messages are shared, rather than
copied). Would it be faster (with a cost of more disk space) to copy
the message instead? How complex is it to determine when the shared
copy
can be deleted? "We now believe that the decision to use only geographical registries was mistaken." If there was the ability to replicate registries on a granularity smaller than the entire registry (e.g. replicating only certain subtrees, if there were a hierarchical naming system), would the problem of having to duplicate huge registries be solved? "...the number of messages arriving for a user would start to overwhelm both the user and the system. We do not know if this phenomenon has a natural sociological limit... An analogous filtering mechanism will be required in the world of electronic message systems..." Gee, sounds like they predicted today's spam problem... Apparently it hasn't hit the "sociological limit" since we're still using email. |
|
In the transparency section of the
paper authors talk about possible surprises that user might experience
because he treats the system as a
single system and not not a distributed system, for example short term
consistency problems. So sometimes it's necessary to for the user to be
aware of the distributed nature of the system he is working with.
Do we have similar problems with current implementations of distributed systems? What are the most important transparency problems that we have in current implementation? In the reliability section they say that one cause of the failures had been the gradual, undetected consumption of the spare capacity required for redundancy to produce reliability. I think that we only need redundancy (resources that we don't use in normal operation) if we want to maintain the performance in the presence of failures but if you can tolerate some performance loss in the presence of failures and just distribute the load on the healthy nodes (Graceful degradation) you don't need to have unused spare resources. I mean that if we have a well designed graceful degradation policy there is no need for unused spares. Is this true? |
Sharing messages among
recepient inboxes can be enhanced if the assignment of inboxes
coreespond to patterns of communication as paper suggests. I`m curious
if such an idea is used in mail servers since usually disk space is not
the primary concern compared to speed of network hence proximity. What
do you think?
File servers apply a 12 hour time-out mechanism when the individuals access the server. This also implies that any password change would also take long to be effective. The authors say this does not interfere with security of the system, but I can`t come up with a reasonable argument how it`s possible not to jeopardize the system`s security. Can you clarify this ? |
|
How will Grapevine scale to trim down the effects of server-end congestion and bandwidth limitations over WWW? If several servers crash simultaneously, how will Grapevine's performance be affected? How is run-time reconfiguration of servers performed? |
|
How relevant is the experience of
dealing with a relatively small number of nodes (around 30) to the
current scenario, wherein around a 1000 nodes would be common place.
How come current systems did not adopt this idea of redundant mailboxes ? Is it due to the scalability problems with this idea ? |
I remember the paper 11
(Implementing Remote Procedure Calls) used Grapevine as distributed
database for RPC binding. Is that Grapevine the same as this
Grapevine of the paper 25?
Is there any improvement in RPC if using the current Grapevine? If Grapevine is used as a database system, then it should concern the data security, however, the paper doesnt talk much about the security. Do you have any suggestions related to data security in Grapevine? |
| Grapevine is just a set of
authenticated SMTP/POP servers isn't it?
The paper title initially had me thinking distributed operating system,
but it states distributed system. Isn't viticulturist access a crude precursor to remote assistance or maybe SSH? What communication protocol do you think they used for message transmission? Registrars do take care of adding users to distribution lists but the paper also states that users could add themselves too. How is that handled securely? "To permit access from home or when traveling..." - How do you think a user accessed the Grapevine when he/she was traveling? :) This paper is actually older than the LOCUS paper. But the LOCUS paper seems to be much more advanced. Guess that is because of an under-developed WAN but mature LANs? (mature OS technology - immature networking technology?) |
|
Section 8 describes the management of distributed system with viticulturist’s entrance and text log, but the management job still requires an expert. How can we make the management more easily, with the consideration of several servers distributed geographically? The paper described the issues of Grapevine such as data distribution, configuration, replication, load management, etc, and suggested the way to improve each issue. Among the suggestions, what are adopted and what are not adopted by current distributed systems? |