Paper 18: Disco: Running Commodity Operating Systems on Scalable Multiprocessors
E. Bugnion, S. Devine, K. Govil, M. Rosenblum 1997
Paper (PDF)
Presentation Slides (PDF)
Presentation Slides (OpenDocument)
Discussion Summary
-
The discussion for this presentation occured five days after the presentation itself, so there wasn't very much discussed...
- NUMA - a somewhat off-topic discussion on whether NUMA is a good thing, by moving complexity from hardware to the operating system.
- Fault tolerance handled by VM (cellular Disco): For large clusters of servers, it would be better to handle fault tolerance at the OS level rather than take a VM performance hit for every working server.
- A brief discussion on other uses for virtual machines (other than for utilizing large NUMA machines). Other uses of VMs are covered by the following few papers (#19-#21).
List of Questions
- This is sort of a useless question, but I'm curious as to how
difficult it really is to get proprietary OS's to 'play nice' with
this scheme, given that the authors often had to modify the HAL of
their virtualized OSs in the evaluation? (This is really a
virtualization question, so it's sort of irrelevant.)
- I didn't find the measures used in the evaluation to be all that
useful, because they didn't compare against other 'special'
multiprocessor operating systems (such as the ones the authors
discuss in the introduction.) Were these systems widely available at
the time of the writing of this paper? What reasons (other than
convenience or availability) might the authors have had for avoiding
this comparison?
-
The authors mention the difficulty in resource management at the
hypervisor level, and something similar to scheduler activations would
seem like it might apply here. How hard would it be to pass scheduling
information down to the hypervisor?
-
A very broad question - What are the limits on the abilities of a system
such as this to scale to other hardware innovations?
-
The performance data seems to indicate that the system scales, with
diminishing returns. Is this scaling due to the nature of the problem
domain, the nature of the hardware, or the nature of the software (ie
disco)?
-
The idea of adding a level of indirection (virtual machine monitors)
between commodity operating systems and the raw hardware is so good, but
I think we have tradeoffs for it. Although the authors said that the
overhead are modest, but is it right?
-
Imagine the combination of this approach with the extensibility of the
microkernel operating systems. Is it able to apply both in a system and
in that case how performance of the system can be improved?
-
How can you justify the use of Disco? The authors needed to modify the IRIX operating system to run on top of the Disco framework, and special device drivers are needed. For systems with 100's of processors, is it ok to give up performance for ease of deployment??
- The sharing of presistent storage seems quite inefficient (logged on separate disk partition). How can this be improved via change in the file system, or the way Disco handles presistent storage?
-
Is the use of virtual machines a simple trade-off betwen performance and
scalability? Notice that the paper mentions that sharing of information
between virtual machines can help with manging unnecessarily replicated
data structures. So if the VMs are running relatively homogeneous workloads
and programs, then Disco can work well. The question is how bad would it
be if the VMs are very heterogenous (actually we can see this in the PMake
benchmark).
- Is it also possible that PMake has issues running on Disco, because there
is a lot of disk I/O and stresses the bottleneck of handling I/O for multiple
VMs?
-
In physical memory abstraction by Disco, the paper says that dynamic page migration and replication is done to make the NUMA architecture of FLASH compatible for non-NUMA OSes. Is this for providing uniform memory access times? How much dependence does an OS have on memory access times (other than performance)? Or are the page techniques to make up for a shared as well as local memory architecture of NUMA?
- Is the HAL for Windows open source? If not, how do they get it to run on DISCO?
- The Disco architecture is basically a tightly coupled distributed computing environment? If performance is the issue, why run different OSes on virtual machines rather than a single NUMA compliant multiprocessing OS? (Application base? - modify Windows/Linux?)
-
Would the virtualization technique used in Disco be useful in
virtualizing/increasing programmability of the GPU, which is effective
a NUMA machine (multi-core with simultaneous access to memory modules)
within the chip?
-
Is a FLASH system eventually produced? If so, what was Disco's
performance on the actual machine?
-
What happens if we allow virtual machines to over commit the
actual physical resources present in the system and dynamically
adjust the resources allocated to each VM? Can this approach lead to
better utilization of the system resources or it just increases the
overhead?
- The paper says that to provide hardware fault-containment, the
monitor itself must be structured into cells. What would be the cost
of doing this? What would be the performance impact? Is it necessary
while the VM monitor provides automatic software fault isolation
between different VMs?
-
I think one of the goals of running multiple operating systems is so that your application can run in an OS that offers services to suit it's purpose. Maybe we should instead focus our research on making better OS services? Why do I want this Disco virtual machine anyway?
-
So they have to make small changes to the OS kernel to get it to run on this system. Isn't it funny that the HAL is never as complete as you would like it to be? Why is that?
-
One thing that's very interesting is that the system actually performs at it's best with multiple copies of the VM running. Maybe NUMA is a step in the right direction for a system running a large number of processes?
-
In section 4.2.2, paper presented a very complex Virtual Physical Memory translation technique. What is the performance/overhead of translating the physical address into corresponding machine address?
-
To avoid the complexity of virtualizing the ASIDs, Disco flushes the machine's TLB when scheduling a different virtual CPU on a physical processor. The author mentioned that this approach speeds up the translation of TLB entry but I wonder how would it work if Disco does not flush the TLB? What kind of deficiency it will cause?
-
Why the virtual device does not limit the maximum transfer unit(MTU) of packets? Does it mean there is no limitation on packet size when transferring packets?
-
Virtualizing all of the actions seems to have given Disco performance and flexibility (e.g. virtual CPU, virtual network, virtual disk). Is this in some ways predecessor to Xen and the rest of the VM bandwagon?
- How big and expensive is all of this virtualization? Most "stuff" gets executed directly, but all of this monitoring seems to have to be in effect to make the whole thing work. Are there other ways to make shared memory multi-processor machines work?
-
Regarding the general trend of the 're-emergence' of VMM and VMs in general (a la Xen, VMWare, etc.), is there any reason to believe that they are really here to stay this time, or does it seem like history will repeat itself and it will again slowly fade away to lie dormant for another few decades before rearing itself?
- To virtualize physical memory while reducing performance degradation due to VMing, Disco has a second-level software TLB in addition to its initial one, which it uses to store recent translations for each virtual machine. Is the size of this l2tlb static, or does it actually dynamically change based on the needs of the particular vm?
-
What is the most effective mechanism of the communication among virtual machines in a system? NFS or is there any others?
-
I suspect that Figure 8 is resulted from the special case that Disco outweighs IRIX or from the specific problem that IRIX has. In general cases, a system running on multiprocessors is faster than several virtual machines on multiprocessors due to the overhead of visual machines, if designed well. Doesn’t it?
-
Would you say that considering the increase in the amount of RAM in
today's machines would reduce the cost of memory allocation for virtual
machines?
-
Would you know of any system today that does use Disco considering that
the overheads of virtualization are small?
-
Is the size of Disco is small because it uses the simple SimOS? It seems that having the complex hardware discovery and device drivers as a part of Disco will make the code base explode in size causing the loss of some performance and complexity benifts?
-
What is the exact overhead of mapping from virtual->pgysical->real machince? How does the TLB flushing on CPU scheduling affect the performance?
-
Are exception vector/ Interrupt vector the only way to get into the privileged mode for all processor architectures ? i.e. Is this the common path through which all processors transit from user mode to privileged mode ?
-
Isn't it better to design and build a more scalable kernel than trying to improve its scalability by running it on top of some virtualization software ?
-
How does remapping the privileged load/store instruction to go to a special page instead of the privileged registers help ? (Writes to certain privileged registers initiate some action like put on the floating point co processor etc, how would it help to redirect these writes to a special page in memory ?)
-
What is the exact difference between virtualizing a resource and multiplexing a resource ? (To me they both seem to be the same thing!)
-
I do not see Disco could save the work load for scalable multiprocessors, because porting it need to be familiar with the architecture of it firstly, which also need effort. Could you give some your own opinions ?
-
What is your idea on Disco performance compared with ported operating systems ?
-
Is this virtiualization technique able to convince the commodity operating systems on the effort to port would be worth it?
-
Disco introduces a virtualization for FLASH sshared memory multiprocessor. What would have been applied if the underlying architecture was x86?
-
As an operating system structuring technique, what is the challenge for Disco described as a microkernel with an unimaginative interface?
-
Disco targeted the FLASH machine, which was unfortunately not yet available. As a result, the authors used SimOS to develop and evaluate Disco. How big differenct would be if the FLASH machine were available? (i.e. using the FLASH machine rather than SimOS)
-
With the coming end of Moore's Law and consequent greater emphasis on multiprocessors, is the idea of running multiple operating systems in virtual machines gaining ground, or is designing systems specifically for multiprocessors seen as key to getting the best possible performance?
-
How much does the virtual machine approach affect the timing and performance characteristics of applications? Does it erase the benefits of careful tuning or is the difference no greater than that of moving the application from one (non-virtual) architecture to another?
-
What is it in the hardware that allows Disco to trap supervisor mode
instructions executed in the virtual machine? Is it because there can
be 2 different levels of supervisor mode ?
-
Could you run Disco on top of Disco ?
-
What would happen if 2 virtual machines running atop Disco were to
bind a socket to the same port ?