Efficient Software-Based Fault Isolation
R. Wahbe, S. Lucco, T. Anderson and S. Graham
Presented by Ali Bakhoda
Presentation Slides
Here is the link for the Power Point file.
Discussion Recap
Question:Most of the extensible software that I use these days runs an embedded interpreter that interprets some domain-specific scripting language, and this interpreter is what provides the sandboxing. So, how valid is this approach today, given the tendency in modern software to allow customization via interpreted scripting languages?
Answer:There are many places that people use languages like C or C++ to write plug-ins or extensions to web applications OS modules. One example is ActiveX.
Some other points discussed in the class:
There
has been a traditional tension between people about using hardware
based or software based protection. For example there have been
proposals such as implementing the whole OS with a type safe language
like Java. The reason behind this proposal is that maybe in such case
there would be no need for protection or RPC at all.
But
we cannot throw away everything which is written in C or C++ in the
past decades and rewrite them. So we should take some of those
lessons that we have learned and figure out how to apply them
directly to existing code. That's why a lot of research is now being
conducted on how to “convert the existing unsafe code into safe
code” and then using them. Different approaches are proposed to
achieve this(some of them use compilers, some of them use linkers and
...)
It was also mentioned that a lot of malicious programs take advantage of lazy programming. Buffer overflow is one of the most frequent forms of lazy programming that happens in C programs and malicious programmers use it.
The problems that would raise in presence of self-modifying codes were also discussed. Self-modifying code is not applicable in sanboxing because the code segment is read only.
There are many areas that software based fault isolation can make their work easier. For example in multimedia streaming, there are many different Coding styles and the problem is that not everybody has the right codec to play the stream. As a solution we can ship the codec with every multimedia file but this may pose security threats. Software-based fault isolation might facilitate this process by isolating the code.
We can look at this method from two different perspectives. One is going up from hardware level and the other is going down from complier level. For example in Java as a high level language when you perform an array lookup, it actually checks the boundaries by some instructions.
List of Submitted Questions
1-Would
we at some point want to take a previously trusted module out of the
sandbox, once it has proven itself reliable?
2- Since the
stack pointer is always in the section, isn't it an option to reuse
some of its bits in place of the dedicated segment register?
3-
In table 1, column "Fault Isolation Overhead (predicted)"
their model predicts a negative value for the cholesky benchmark. How
could this ever happen? The value can only be negative if
s-instructions < interlocks,
4- The example that the authors use to justify the need for their 'style' of fault isolation is the software _extension_, where users write some extension for an application and mess it up in a variety of ways. These extensions are assumed to be compiled down to machine code and then run directly, which is why this sandboxing is so critical in these cases. However, most of the extensible software that I use these days runs an embedded interpreter that interprets some domain-specific scripting language, and this interpreter is what provides the sandboxing. So, how valid is this approach today, given the tendency in modern software to allow customization via interpreted scripting languages?
5-
According to the process described in the paper, it is necessary to
re-compile the source-code with a modified compiler to use the
software fault isolation technique. Is this always a feasible
solution? What if we are using a 3rd party library?
6- Can
software based fault isolation be implemented more easily for
languages like Java where execution occurs within in a virtual
machine?
7- How does hardware fault isolation work?
8-
The paper states that there is a trade off between distrusted code
execution time and domain switching time when using software based
fault isolation. It assumes that distrusted code is a small part of
the application and so the increase in execution time will be well
compensated by the domain switching time saved. But what if the
distrusted code is for example the file system at user level in a
multikernel? Isn't any 'distrusted' software insertion into the
kernel with software fault isolation a performance issue? Probably a
micro-kernel vs monolithic kernel question - but I am not sure in the
case of kernels if faster domain switches will compensate for slower
execution.
9- How do you decide on the number of upper bits to
use to identify a segment? And what if virtual memory allocated is
not contiguous?
10-
Given that the overhead of software isolation and micro-kernel RPC
are within the same order of magnitude (which seems to be the case
here), which one is preferred?
11- Is there any implementation
of software isolation using binary patching?
12-
The authors chose to modify gcc to support instrumenting their code
with sandbox/fault isolation. This seems (to me) like a lot more work
(and more limited) than modifying object code directly (either at
load time or post-compile). Wouldn't it be easier to go the object
code modification route?
13- This seems like a great idea. Why
hasn't it caught on? Is it because compilers would have to be
modified? Then why doesn't Microsoft do it ...to enable their
"trustworthy" computing initiative?
14- Could you give some introduction on hardware fault isolation and compare it with software fault isolation not just on performance?
15- This scheme is language independent. Does this mean that any language could be used to realize this scheme ? And is there some difference on performance for C or other languages ?
16- What is a difference between Figure 5 and Figure 6?
17- What is the best method for a secure and extensible OS nowadays?
18-
When speaking about cross fault domain communication (Section 4) the
authors say that a return stub of a customized call of a fault domain
can be trusted. Why is that?
19- When talking about binary
patching, the authors say that they are working on a binary patching
prototype that uses extensions to current object file format. What do
they mean by "extensions" in this context?
20- As pointed out in the paper, the communication between fault domains is faster than in hardware fault isolation. How about the communication between the modules in fault domains and in application’s address space?
21- How difficult to modify the object code of a distrusted module to prevent it from writing or jumping to an address outside its fault domain?
22- What is the
advantage of using lazy pointer swizzling?
23- The author
claims that his approach is programming language independent;
however, he uses the gcc compiler to generate his checks?
24- Regarding
per-address-space resources such as files, how does the arbitration
code know which fault domains are using which resources? Wouldn't it
be necessary for even the trusted portion of an application to access
these resources through the arbitration code so that it can keep
track of which resources are used by the trusted code?
25- It
seems as though they are doing things in software that should be done
in hardware and by the OS, essentially implementing fast
context-switching at the user level. Wouldn't it have been better
just
to make context-switching in the OS/hardware faster, and
didn't these results suggest that it was possible?
26- In the
paper, we enforce protection by modifying the compiler. This seems
like a band aid solution for the real problem - poorly written
software. The performance may be good but are we really solving the
right problem?
27- How do we decide what module is trusted or
not? It's probably something we decide when building the software
right? Aren't we then putting too much control into the hands of our
users? Wouldn't a user likely just specify all modules as trusted?
28-
In the abstract, paper addresses we present a software approach to
implementing fault isolation with a single address space. Is the
single address space shared by a specific process or many
processes?
29- About the performance, the paper states our
approach poses a trade off relative to hardware fault isolation:
substantially faster communication between fault domains, at a cost
of slightly increased execution time for distrusted modules. If
considering the trade off, does the performance
show a greater
improvement?
30- While
determining the overhead of software encapsulation, the authors
'conjecture'd that some rather drastic variations they saw where
caused by instruction caching mapping conflicts. Was this conjecture
ever actually proven to be fact(or theorem, given the
terminology)?
31- While sandboxing an application and
basically re-routing any of its calls that the OS doesn't agree with
is a useful security measure, doesn't that sound a lot like what
Vista tried to get away with, forcing 'untrusted' drivers to jump
through hoops/their developers to pay to get kernel access?
32- The paper
says "An unsafe instruction is any instruction that jumps to
stores to an address that can not be statically verified ..."
Could you explain what "statically verifies " mean?
33-
In section 3.5, the paper says "For each unsafe instruction
checked, shared segment matching requires one more instruction than
segment matching" Why does it need just one more
instruction?
34- Compared to Software fault isolation, when is
it better to use hardware fault isolation?
35- In the case
of Apache, there are interpreter modules that can be plugged in the
web server in the form of shared libraries (ex:mod_python, mod_perl,
or mod_php). They are much faster than their CGI counterpart (and
faster than FastCGI), but are not isolated from the core of the web
server. Is performance nowadays more important than prevention
against unlikely failures?
36- Suppose there is a STORE
operation at address 0xAAAABCDE, and the address sandboxing changes
that address to 0xCCCCBCDE to make sure that the store operation does
not cross the fault domain (it's obviously not writing at the same
place). How could that escape the segment matching precautions ? Can
sandboxing the addresses insert some very nasty bugs in a program at
runtime ?
37- What happens if you want your fault domain
bigger than a segment ?
38- Compiler
sandboxing: Would it be vulnerable if the sandboxed module is
malicious? (e.g. using system calls, or other library functions,
say... a non-inlined call to the C library's memcpy?)
39-
Performance improvements seem good, but feels too fragile for use
with malicious+distrusted modules.
40- Do you think
that predicting the fault overhead isolation should also be based on
the architecture of the CPU because of the different pipelines in the
different CPU architectures?
41- What kind of security
problems might load instructions produce when they are not sandboxed?
42- How does the
sandboxing/segment matching technique work, when the extension code
needs to call into some kernel code or when it needs to access some
variable in the kernel code ?
43- How does this non
deterministic instrumentation of code, affect the timing
behavior/guarantees for the code?