Efficient Software-Based Fault Isolation

R. Wahbe, S. Lucco, T. Anderson and S. Graham

Presented by Ali Bakhoda


Presentation Slides

Here is the link for the Power Point file.


Discussion Recap

Question:Most of the extensible software that I use these days runs an embedded interpreter that interprets some domain-specific scripting language, and this interpreter is what provides the sandboxing. So, how valid is this approach today, given the tendency in modern software to allow customization via interpreted scripting languages?

Answer:There are many places that people use languages like C or C++ to write plug-ins or extensions to web applications OS modules. One example is ActiveX.

Some other points discussed in the class:

There has been a traditional tension between people about using hardware based or software based protection. For example there have been proposals such as implementing the whole OS with a type safe language like Java. The reason behind this proposal is that maybe in such case there would be no need for protection or RPC at all.
But we cannot throw away everything which is written in C or C++ in the past decades and rewrite them. So we should take some of those lessons that we have learned and figure out how to apply them directly to existing code. That's why a lot of research is now being conducted on how to “convert the existing unsafe code into safe code” and then using them. Different approaches are proposed to achieve this(some of them use compilers, some of them use linkers and ...)

It was also mentioned that a lot of malicious programs take advantage of lazy programming. Buffer overflow is one of the most frequent forms of lazy programming that happens in C programs and malicious programmers use it.

The problems that would raise in presence of self-modifying codes were also discussed. Self-modifying code is not applicable in sanboxing because the code segment is read only.

There are many areas that software based fault isolation can make their work easier. For example in multimedia streaming, there are many different Coding styles and the problem is that not everybody has the right codec to play the stream. As a solution we can ship the codec with every multimedia file but this may pose security threats. Software-based fault isolation might facilitate this process by isolating the code.

We can look at this method from two different perspectives. One is going up from hardware level and the other is going down from complier level. For example in Java as a high level language when you perform an array lookup, it actually checks the boundaries by some instructions.


List of Submitted Questions

1-Would we at some point want to take a previously trusted module out of the sandbox, once it has proven itself reliable?

2- Since the stack pointer is always in the section, isn't it an option to reuse some of its bits in place of the dedicated segment register?

3- In table 1, column "Fault Isolation Overhead (predicted)" their model predicts a negative value for the cholesky benchmark. How could this ever happen? The value can only be negative if s-instructions < interlocks,

4- The example that the authors use to justify the need for their 'style' of fault isolation is the software _extension_, where users write some extension for an application and mess it up in a variety of ways. These extensions are assumed to be compiled down to machine code and then run directly, which is why this sandboxing is so critical in these cases. However, most of the extensible software that I use these days runs an embedded interpreter that interprets some domain-specific scripting language, and this interpreter is what provides the sandboxing. So, how valid is this approach today, given the tendency in modern software to allow customization via interpreted scripting languages?

5- According to the process described in the paper, it is necessary to re-compile the source-code with a modified compiler to use the software fault isolation technique. Is this always a feasible solution? What if we are using a 3rd party library?

6- Can software based fault isolation be implemented more easily for languages like Java where execution occurs within in a virtual machine?

7- How does hardware fault isolation work?

8- The paper states that there is a trade off between distrusted code execution time and domain switching time when using software based fault isolation. It assumes that distrusted code is a small part of the application and so the increase in execution time will be well compensated by the domain switching time saved. But what if the distrusted code is for example the file system at user level in a multikernel? Isn't any 'distrusted' software insertion into the kernel with software fault isolation a performance issue? Probably a micro-kernel vs monolithic kernel question - but I am not sure in the case of kernels if faster domain switches will compensate for slower execution.

9- How do you decide on the number of upper bits to use to identify a segment? And what if virtual memory allocated is not contiguous?

10- Given that the overhead of software isolation and micro-kernel RPC are within the same order of magnitude (which seems to be the case here), which one is preferred?

11- Is there any implementation of software isolation using binary patching?

12- The authors chose to modify gcc to support instrumenting their code with sandbox/fault isolation. This seems (to me) like a lot more work (and more limited) than modifying object code directly (either at load time or post-compile). Wouldn't it be easier to go the object code modification route?

13- This seems like a great idea. Why hasn't it caught on? Is it because compilers would have to be modified? Then why doesn't Microsoft do it ...to enable their "trustworthy" computing initiative?

14- Could you give some introduction on hardware fault isolation and compare it with software fault isolation not just on performance?

15- This scheme is language independent. Does this mean that any language could be used to realize this scheme ? And is there some difference on performance for C or other languages ?

16- What is a difference between Figure 5 and Figure 6?

17- What is the best method for a secure and extensible OS nowadays?

18- When speaking about cross fault domain communication (Section 4) the authors say that a return stub of a customized call of a fault domain can be trusted. Why is that?

19- When talking about binary patching, the authors say that they are working on a binary patching prototype that uses extensions to current object file format. What do they mean by "extensions" in this context?

20- As pointed out in the paper, the communication between fault domains is faster than in hardware fault isolation. How about the communication between the modules in fault domains and in application’s address space?

21- How difficult to modify the object code of a distrusted module to prevent it from writing or jumping to an address outside its fault domain?

22- What is the advantage of using lazy pointer swizzling?

23- The author claims that his approach is programming language independent; however, he uses the gcc compiler to generate his checks?

24- Regarding per-address-space resources such as files, how does the arbitration code know which fault domains are using which resources? Wouldn't it be necessary for even the trusted portion of an application to access these resources through the arbitration code so that it can keep track of which resources are used by the trusted code?

25- It seems as though they are doing things in software that should be done in hardware and by the OS, essentially implementing fast context-switching at the user level. Wouldn't it have been better just
to make context-switching in the OS/hardware faster, and didn't these results suggest that it was possible?

26- In the paper, we enforce protection by modifying the compiler. This seems like a band aid solution for the real problem - poorly written software. The performance may be good but are we really solving the right problem?

27- How do we decide what module is trusted or not? It's probably something we decide when building the software right? Aren't we then putting too much control into the hands of our users? Wouldn't a user likely just specify all modules as trusted?

28- In the abstract, paper addresses we present a software approach to implementing fault isolation with a single address space. Is the single address space shared by a specific process or many processes?

29- About the performance, the paper states our approach poses a trade off relative to hardware fault isolation: substantially faster communication between fault domains, at a cost of slightly increased execution time for distrusted modules. If considering the trade off, does the performance
show a greater improvement?

30- While determining the overhead of software encapsulation, the authors 'conjecture'd that some rather drastic variations they saw where caused by instruction caching mapping conflicts. Was this conjecture ever actually proven to be fact(or theorem, given the terminology)?

31- While sandboxing an application and basically re-routing any of its calls that the OS doesn't agree with is a useful security measure, doesn't that sound a lot like what Vista tried to get away with, forcing 'untrusted' drivers to jump through hoops/their developers to pay to get kernel access?

32- The paper says "An unsafe instruction is any instruction that jumps to stores to an address that can not be statically verified ..." Could you explain what "statically verifies " mean?

33- In section 3.5, the paper says "For each unsafe instruction checked, shared segment matching requires one more instruction than segment matching" Why does it need just one more instruction?

34- Compared to Software fault isolation, when is it better to use hardware fault isolation?

35- In the case of Apache, there are interpreter modules that can be plugged in the web server in the form of shared libraries (ex:mod_python, mod_perl, or mod_php). They are much faster than their CGI counterpart (and faster than FastCGI), but are not isolated from the core of the web server. Is performance nowadays more important than prevention against unlikely failures?

36- Suppose there is a STORE operation at address 0xAAAABCDE, and the address sandboxing changes that address to 0xCCCCBCDE to make sure that the store operation does not cross the fault domain (it's obviously not writing at the same place). How could that escape the segment matching precautions ? Can sandboxing the addresses insert some very nasty bugs in a program at runtime ?

37- What happens if you want your fault domain bigger than a segment ?

38- Compiler sandboxing: Would it be vulnerable if the sandboxed module is malicious? (e.g. using system calls, or other library functions, say... a non-inlined call to the C library's memcpy?)

39- Performance improvements seem good, but feels too fragile for use with malicious+distrusted modules.

40- Do you think that predicting the fault overhead isolation should also be based on the architecture of the CPU because of the different pipelines in the different CPU architectures?

41- What kind of security problems might load instructions produce when they are not sandboxed?

42- How does the sandboxing/segment matching technique work, when the extension code needs to call into some kernel code or when it needs to access some variable in the kernel code ?

43- How does this non deterministic instrumentation of code, affect the timing behavior/guarantees for the code?