diff options
Diffstat (limited to 'docs/dexopt.html')
-rw-r--r-- | docs/dexopt.html | 326 |
1 files changed, 326 insertions, 0 deletions
diff --git a/docs/dexopt.html b/docs/dexopt.html new file mode 100644 index 0000000..7f0b4bc --- /dev/null +++ b/docs/dexopt.html @@ -0,0 +1,326 @@ +<html> +<head> + <title>Dalvik Optimization and Verification</title> +</head> + +<body> +<h1>Dalvik Optimization and Verification With <i>dexopt</i></h1> + +<p> +The Dalvik virtual machine was designed specifically for the Android +mobile platform. The target systems have little RAM, store data on slow +internal flash memory, and generally have the performance characteristics +of decade-old desktop systems. They also run Linux, which provides +virtual memory, processes and threads, and UID-based security mechanisms. +<p> +The features and limitations caused us to focus on certain goals: + +<ul> + <li>Class data, notably bytecode, must be shared between multiple + processes to minimize total system memory usage. + <li>The overhead in launching a new app must be minimized to keep + the device responsive. + <li>Storing class data in individual files results in a lot of + redundancy, especially with respect to strings. To conserve disk + space we need to factor this out. + <li>Parsing class data fields adds unnecessary overhead during + class loading. Accessing data values (e.g. integers and strings) + directly as C types is better. + <li>Bytecode verification is necessary, but slow, so we want to verify + as much as possible outside app execution. + <li>Bytecode optimization (quickened instructions, method pruning) is + important for speed and battery life. + <li>For security reasons, processes may not edit shared code. +</ul> + +<p> +The typical VM implementation uncompresses individual classes from a +compressed archive and stores them on the heap. This implies a separate +copy of each class in every process, and slows application startup because +the code must be uncompressed (or at least read off disk in many small +pieces). On the other hand, having the bytecode on the local heap makes +it easy to rewrite instructions on first use, facilitating a number of +different optimizations. +<p> +The goals led us to make some fundamental decisions: + +<ul> + <li>Multiple classes are aggregated into a single "DEX" file. + <li>DEX files are mapped read-only and shared between processes. + <li>Byte ordering and word alignment are adjusted to suit the local + system. + <li>Bytecode verification is mandatory for all classes, but we want + to "pre-verify" whatever we can. + <li>Optimizations that require rewriting bytecode must be done ahead + of time. +</ul> + +<p> +The consequences of these decisions are explained in the following sections. + + +<h2>VM Operation</h2> + +<p> +Application code is delivered to the system in a <code>.jar</code> +or <code>.apk</code> file. These are really just <code>.zip</code> +archives with some meta-data files added. The Dalvik DEX data file +is always called <code>classes.dex</code>. +<p> +The bytecode cannot be memory-mapped and executed directly from the zip +file, because the data is compressed and the start of the file is not +guaranteed to be word-aligned. These problems could be addressed by +storing <code>classes.dex</code> without compression and padding out the zip +file, but that would increase the size of the package sent across the +data network. +<p> +We need to extract <code>classes.dex</code> from the zip archive before +we can use it. While we have the file available, we might as well perform +some of the other actions (realignment, optimization, verification) described +earlier. This raises a new question however: who is responsible for doing +this, and where do we keep the output? + +<h3>Preparation</h3> + +<p> +There are at least three different ways to create a "prepared" DEX file, +sometimes known as "ODEX" (for Optimized DEX): +<ol> + <li>The VM does it "just in time". The output goes into a special + <code>dalvik-cache</code> directory. This works on the desktop and + engineering-only device builds where the permissions on the + <code>dalvik-cache</code> directory are not restricted. On production + devices, this is not allowed. + <li>The system installer does it when an application is first added. + It has the privileges required to write to <code>dalvik-cache</code>. + <li>The build system does it ahead of time. The relevant <code>jar</code> + / <code>apk</code> files are present, but the <code>classes.dex</code> + is stripped out. The optimized DEX is stored next to the original + zip archive, not in <code>dalvik-cache</code>, and is part of the + system image. +</ol> +<p> +The <code>dalvik-cache</code> directory is more accurately +<code>$ANDROID_DATA/data/dalvik-cache</code>. The files inside it have +names derived from the full path of the source DEX. On the device the +directory is owned by <code>system</code> / <code>system</code> +and has 0771 permissions, and the optimized DEX files stored there are +owned by <code>system</code> and the +application's group, with 0644 permissions. DRM-locked applications will +use 640 permissions to prevent other user applications from examining them. +The bottom line is that you can read your own DEX file and those of most +other applications, but you cannot create, modify, or remove them. +<p> +Preparation of the DEX file for the "just in time" and "system installer" +approaches proceeds in three steps: +<p> +First, the dalvik-cache file is created. This must be done in a process +with appropriate privileges, so for the "system installer" case this is +done within <code>installd</code>, which runs as root. +<p> +Second, the <code>classes.dex</code> entry is extracted from the the zip +archive. A small amount of space is left at the start of the file for +the ODEX header. +<p> +Third, the file is memory-mapped for easy access and tweaked for use on +the current system. This includes byte-swapping and structure realigning, +but no meaningful changes to the DEX file. We also do some basic +structure checks, such as ensuring that file offsets and data indices +fall within valid ranges. +<p> +The build system uses a hairy process that involves starting the +emulator, forcing just-in-time optimization of all relevant DEX files, +and then extracting the results from <code>dalvik-cache</code>. The +reasons for doing this, rather than using a tool that runs on the desktop, +will become more apparent when the optimizations are explained. +<p> +Once the code is byte-swapped and aligned, we're ready to go. We append +some pre-computed data, fill in the ODEX header at the start of the file, +and start executing. (The header is filled in last, so that we don't +try to use a partial file.) If we're interested in verification and +optimization, however, we need to insert a step after the initial prep. + +<h3>dexopt</h3> + +<p> +We want to verify and optimize all of the classes in the DEX file. The +easiest and safest way to do this is to load all of the classes into +the VM and run through them. Anything that fails to load is simply not +verified or optimized. Unfortunately, this can cause allocation of some +resources that are difficult to release (e.g. loading of native shared +libraries), so we don't want to do it in the same virtual machine that +we're running applications in. +<p> +The solution is to invoke a program called <code>dexopt</code>, which +is really just a back door into the VM. It performs an abbreviated VM +initialization, loads zero or more DEX files from the bootstrap class +path, and then sets about verifying and optimizing whatever it can from +the target DEX. On completion, the process exits, freeing all resources. +<p> +It is possible for multiple VMs to want the same DEX file at the same +time. File locking is used to ensure that dexopt is only run once. + + +<h2>Verification</h2> + +<p> +The bytecode verification process involves scanning through the instructions +in every method in every class in a DEX file. The goal is to identify +illegal instruction sequences so that we don't have to check for them at +run time. Many of the computations involved are also necessary for "exact" +garbage collection. See +<a href="verifier.html">Dalvik Bytecode Verifier Notes</a> for more +information. +<p> +For performance reasons, the optimizer (described in the next section) +assumes that the verifier has run successfully, and makes some potentially +unsafe assumptions. By default, Dalvik insists upon verifying all classes, +and only optimizes classes that have been verified. If you want to +disable the verifier, you can use command-line flags to do so. See also +<a href="embedded-vm-control.html"> Controlling the Embedded VM</a> +for instructions on controlling these +features within the Android application framework. +<p> +Reporting of verification failures is a tricky issue. For example, +calling a package-scope method on a class in a different package is +illegal and will be caught by the verifier. We don't necessarily want +to report it during verification though -- we actually want to throw +an exception when the method call is attempted. Checking the access +flags on every method call is expensive though. The +<a href="verifier.html">Dalvik Bytecode Verifier Notes</a> document +addresses this issue. +<p> +Classes that have been verified successfully have a flag set in the ODEX. +They will not be re-verified when loaded. The Linux access permissions +are expected to prevent tampering; if you can get around those, installing +faulty bytecode is far from the easiest line of attack. The ODEX file has +a 32-bit checksum, but that's chiefly present as a quick check for +corrupted data. + + +<h2>Optimization</h2> + +<p> +Virtual machine interpreters typically perform certain optimizations the +first time a piece of code is used. Constant pool references are replaced +with pointers to internal data structures, operations that always succeed +or always work a certain way are replaced with simpler forms. Some of +these require information only available at runtime, others can be inferred +statically when certain assumptions are made. +<p> +The Dalvik optimizer does the following: +<ul> + <li>For virtual method calls, replace the method index with a + vtable index. + <li>For instance field get/put, replace the field index with + a byte offset. Also, merge the boolean / byte / char / short + variants into a single 32-bit form (less code in the interpreter + means more room in the CPU I-cache). + <li>Replace a handful of high-volume calls, like String.length(), + with "inline" replacements. This skips the usual method call + overhead, directly switching from the interpreter to a native + implementation. + <li>Prune empty methods. The simplest example is + <code>Object.<init></code>, which does nothing, but must be + called whenever any object is allocated. The instruction is + replaced with a new version that acts as a no-op unless a debugger + is attached. + <li>Append pre-computed data. For example, the VM wants to have a + hash table for lookups on class name. Instead of computing this + when the DEX file is loaded, we can compute it now, saving heap + space and computation time in every VM where the DEX is loaded. +</ul> + +<p> +All of the instruction modifications involve replacing the opcode with +one not defined by the Dalvik specification. This allows us to freely +mix optimized and unoptimized instructions. The set of optimized +instructions, and their exact representation, is tied closely to the VM +version. +<p> +Most of the optimizations are obvious "wins". The use of raw indices +and offsets not only allows us to execute more quickly, we can also +skip the initial symbolic resolution. Pre-computation eats up +disk space, and so must be done in moderation. +<p> +There are a couple of potential sources of trouble with these +optimizations. First, vtable indices and byte offsets are subject to +change if the VM is updated. Second, if a superclass is in a different +DEX, and that other DEX is updated, we need to ensure that our optimized +indices and offsets are updated as well. A similar but more subtle +problem emerges when user-defined class loaders are employed: the class +we actually call may not be the one we expected to call. +<p>These problems are addressed with dependency lists and some limitations +on what can be optimized. + + +<h2>Dependencies and Limitations</h2> + +<p> +The optimized DEX file includes a list of dependencies on other DEX files, +plus the CRC-32 and modification date from the originating +<code>classes.dex</code> zip file entry. The dependency list includes the +full path to the <code>dalvik-cache</code> file, and the file's SHA-1 +signature. The timestamps of files on the device are unreliable and +not used. The dependency area also includes the VM version number. +<p> +An optimized DEX is dependent upon all of the DEX files in the bootstrap +class path. DEX files that are part of the bootstrap class path depend +upon the DEX files that appeared earlier. To ensure that nothing outside +the dependent DEX files is available, <code>dexopt</code> only loads the +bootstrap classes. References to classes in other DEX files fail, which +causes class loading and/or verification to fail, and classes with +external dependencies are simply not optimized. +<p> +This means that splitting code out into many separate DEX files has a +disadvantage: virtual method calls and instance field lookups between +non-boot DEX files can't be optimized. Because verification is pass/fail +with class granularity, no method in a class that has any reliance on +classes in external DEX files can be optimized. This may be a bit +heavy-handed, but it's the only way to guarantee that nothing breaks +when individual pieces are updated. +<p> +Another negative consequence: any change to a bootstrap DEX will result +in rejection of all optimized DEX files. This makes it hard to keep +system updates small. +<p> +Despite our caution, there is still a possibility that a class in a DEX +file loaded by a user-defined class loader could ask for a bootstrap class +(say, String) and be given a different class with the same name. If a +class in the DEX file being processed has the same name as a class in the +bootstrap DEX files, the class will be flagged as ambiguous and references +to it will not be resolved during verification / optimization. The class +linking code in the VM does additional checks to plug another hole; +see the verbose description in the VM sources for details (vm/oo/Class.c). +<p> +If one of the dependencies is updated, we need to re-verify and +re-optimize the DEX file. If we can do a just-in-time <code>dexopt</code> +invocation, this is easy. If we have to rely on the installer daemon, or +the DEX was shipped only in ODEX, then the VM has to reject the DEX. +<p> +The output of <code>dexopt</code> is byte-swapped and struct-aligned +for the host, and contains indices and offsets that are highly VM-specific +(both version-wise and platform-wise). For this reason it's tricky to +write a version of <code>dexopt</code> that runs on the desktop but +generates output suitable for a particular device. The safest way to +invoke it is on the target device, or on an emulator for that device. + + +<h2>Generated DEX</h2> + +<p> +Some languages and frameworks rely on the ability to generate bytecode +and execute it. The rather heavy <code>dexopt</code> verification and +optimization model doesn't work well with that. +<p> +We intend to support this in a future release, but the exact method is +to be determined. We may allow individual classes to be added or whole +DEX files; may allow Java bytecode or Dalvik bytecode in instructions; +may perform the usual set of optimizations, or use a separate interpreter +that performs on-first-use optimizations directly on the bytecode (which +won't be mapped read-only, since it's locally defined). + +<address>Copyright © 2008 The Android Open Source Project</address> + +</body> +</html> |