Common Intermediate Language Explained

Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL) or Intermediate Language (IL),[1] is the intermediate language binary instruction set defined within the Common Language Infrastructure (CLI) specification.[2] CIL instructions are executed by a CIL-compatible runtime environment such as the Common Language Runtime. Languages which target the CLI compile to CIL. CIL is object-oriented, stack-based bytecode. Runtimes typically just-in-time compile CIL instructions into native code.

CIL was originally known as Microsoft Intermediate Language (MSIL) during the beta releases of the .NET languages. Due to standardization of C# and the CLI, the bytecode is now officially known as CIL.[3] Windows Defender virus definitions continue to refer to binaries compiled with it as MSIL.[4]

General information

During compilation of CLI programming languages, the source code is translated into CIL code rather than into platform- or processor-specific object code. CIL is a CPU- and platform-independent instruction set that can be executed in any environment supporting the Common Language Infrastructure, such as the .NET runtime on Windows, or the cross-platform Mono runtime. In theory, this eliminates the need to distribute different executable files for different platforms and CPU types. CIL code is verified for safety during runtime, providing better security and reliability than natively compiled executable files.[5] [6]

The execution process looks like this:

  1. Source code is converted to CIL bytecode and a CLI assembly is created.
  2. Upon execution of a CIL assembly, its code is passed through the runtime's JIT compiler to generate native code. Ahead-of-time compilation may also be used, which eliminates this step, but at the cost of executable-file portability.
  3. The computer's processor executes the native code.

Instructions

See also: List of CIL instructions.

CIL bytecode has instructions for the following groups of tasks:

Computational model

The Common Intermediate Language is object-oriented and stack-based, which means that instruction parameters and results are kept on a single stack instead of in several registers or other memory locations, as in most programming languages.

Code that adds two numbers in x86 assembly language, where eax and edx specify two different general-purpose registers:add eax, edx

Code in an intermediate language (IL), where 0 is eax and 1 is edx:ldloc.0 // push local variable 0 onto stackldloc.1 // push local variable 1 onto stackadd // pop and add the top two stack items then push the result onto the stackstloc.0 // pop and store the top stack item to local variable 0

In the latter example, the values of the two registers, eax and edx, are first pushed on the stack. When the add-instruction is called the operands are "popped", or retrieved, and the result is "pushed", or stored, on the stack. The resulting value is then popped from the stack and stored in eax.

Object-oriented concepts

CIL is designed to be object-oriented. You may create objects, call methods, and use other types of members, such as fields.

Every method needs (with some exceptions) to reside in a class. So does this static method:.class public Foo

The method Add does not require any instance of Foo to be declared because it is declared as static, and it may then be used like this in C#:int r = Foo.Add(2, 3); // 5

In CIL it would look like this:ldc.i4.2ldc.i4.3call int32 Foo::Add(int32, int32)stloc.0

Instance classes

An instance class contains at least one constructor and some instance members. The following class has a set of methods representing actions of a Car-object..class public Car

Creating objects

In C# class instances are created like this:Car myCar = new Car(1, 4); Car yourCar = new Car(1, 3);

And those statements are roughly the same as these instructions in CIL:ldc.i4.1ldc.i4.4newobj instance void Car::.ctor(int, int)stloc.0 // myCar = new Car(1, 4);ldc.i4.1ldc.i4.3newobj instance void Car::.ctor(int, int)stloc.1 // yourCar = new Car(1, 3);

Invoking instance methods

Instance methods are invoked in C# as the one that follows:myCar.Move(3);

As invoked in CIL:ldloc.0 // Load the object "myCar" on the stackldc.i4.3call instance void Car::Move(int32)

Metadata

See main article: Metadata (CLI).

The Common Language Infrastructure (CLI) records information about compiled classes as metadata. Like the type library in the Component Object Model, this enables applications to support and discover the interfaces, classes, types, methods, and fields in the assembly. The process of reading such metadata is called "reflection".

Metadata can be data in the form of "attributes". Attributes can be customized by extending the Attribute class. This is a powerful feature. It allows the creator of the class the ability to adorn it with extra information that consumers of the class can use in various meaningful ways, depending on the application domain.

Example

Below is a basic "Hello, World!" program written in CIL assembler. It will display the string "Hello, world!"..assembly Hello .assembly extern mscorlib .method static void Main

The following code is more complex in number of opcodes.

This code can also be compared with the corresponding code in the article about Java bytecode.static void Main(string[] args)

In CIL assembler syntax it looks like this:.method private hidebysig static void Main(string[] args) cil managed

This is just a representation of how CIL looks near the virtual machine (VM) level. When compiled the methods are stored in tables and the instructions are stored as bytes inside the assembly, which is a Portable Executable (PE).

Generation

A CIL assembly and instructions are generated by either a compiler or a utility called the IL Assembler (ILAsm) that is shipped with the execution environment.

Assembled CIL can also be disassembled into code again using the IL Disassembler (ILDASM). There are other tools such as .NET Reflector that can decompile CIL into a high-level language (e. g. C# or Visual Basic). This makes CIL a very easy target for reverse engineering. This trait is shared with Java bytecode. However, there are tools that can obfuscate the code, and do it so that the code cannot be easily readable but still be runnable.

Execution

Just-in-time compilation

Just-in-time compilation (JIT) involves turning the byte-code into code immediately executable by the CPU. The conversion is performed gradually during the program's execution. JIT compilation provides environment-specific optimization, runtime type safety, and assembly verification. To accomplish this, the JIT compiler examines the assembly metadata for any illegal accesses and handles violations appropriately.

Ahead-of-time compilation

CLI-compatible execution environments also come with the option to do an Ahead-of-time compilation (AOT) of an assembly to make it execute faster by removing the JIT process at runtime.

In the .NET Framework there is a special tool called the Native Image Generator (NGEN) that performs the AOT. A different approach for AOT is CoreRT that allows the compilation of .Net Core code to a single executable with no dependency on a runtime. In Mono there is also an option to do an AOT.

Pointer instructions - C++/CLI

A notable difference from Java's bytecode is that CIL comes with,,, and many call instructions which are enough for data/function pointers manipulation needed to compile C/C++ code into CIL.

class A ;void test_pointer_operations(int param)

The corresponding code in CIL can be rendered as this:

.method assembly static void modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl) test_pointer_operations(int32 param) cil managed // end of method 'Global Functions'::test_pointer_operations

See also

Further reading

External links

Notes and References

  1. Web site: Intermediate Language & execution.
  2. Web site: . 32 --> .
  3. Web site: What is Intermediate Language(IL)/MSIL/CIL in .NET . 2011-02-17. CIL: ... When we compile [a]. NET project, it [is] not directly converted to binary code but to the intermediate language. When a project is run, every language of .NET programming is converted into binary code into CIL. Only some part of CIL that is required at run time is converted into binary code. DLL and EXE of .NET are also in CIL form..
  4. Web site: HackTool:MSIL/SkypeCracker . Microsoft . 26 November 2019.
  5. Book: Benefits of CIL. 2011-02-17. Troelsen. Andrew. 2009-05-02. 9781590598849.
  6. Web site: Unmanaged, Managed Extensions for C++, Managed and .Net Framework. www.visualcplusdotnet.com. 2020-07-07.