Pages

Friday, May 11, 2012

How Computers Work - Software

Source Code

The "source code" is the high level code authored by the programmer and fed into the compiler. Generally just the program.exe file is distributed to users. The programmer retains the source code. Changing the program in the future generally requires access to the source code. For example to add a feature, the programmer would make changes in the source code, and then run the compiler to produce a new version of the program.

Open Source

"Open Source" refers to software where the program includes access to its source code, and a license where the user can make their own modifications. Typically open source software is distributed for free. Critically, beyond the free price, open source software also includes freedom/independence since the user is not dependent on the original vendor to make changes or fixes or whatever to the source code. Since the source code is available, if a user feels strongly enough about some feature, they can add that feature themselves. Typically open source licenses include a requirement that such improvements to the source code be made available back to the community at large. We'll talk about open source more later on, but I wanted to touch on it here since it is a good example of the difference between a program and its source code.

2. New School: Dynamic Languages / Interpreter

There is a broad category of more modern languages such as Java (the world's most popular language, used in CS106A), Javascript, and Python, which do not use the compiler/machine-code strategy. Instead, these languages can be implemented by an "interpreter", and I will lump them into the category of "dynamic" languages.
An interpreter is a program which reads in source code as its input, and "runs" the input code. The interpreter proceeds through the code given to it, line by line. For each line, the interpreter deconstructs what the line says and performs those actions, piece by piece. For example, Javascript which we have been using, is implemented by a Javascript interpreter which is built into Firefox.
So in Javascript when we have code lines like:
  // Javascript code
  a = 1;
  b = a + 2;
The interpreter runs this code, by taking the lines one at a time, and for each, interpreting its actions. For "a = 1;" the interpreter reserves a few bytes to store the value of a, then stores the value 1 into those bytes. Then for "b = a + 2;" the interpreter evaluates (a + 2) getting the value 3, reserves some bytes for the b variable, then stores the 3 into the b bytes.
A compiler translates all the source code into equivalent machine code program.exe to be run later -- it is a bulk translation. An interpreter looks at each line of code, and translates and runs it in the moment, and then proceeds to the next line of source code. The interpreter does not produce a program.exe, instead it performs the actions specified in the source code directly.

Compiler Evaluation

Compiled code generally runs faster than interpreted code. This is because many questions -- how to append to this string, how many bytes do I need here -- are resolved by the compiler at compile time, long before the program runs. The compiler has, in effect, pre-processed the source code, stripping out many questions and complications, leaving the program.exe as lean and direct as it can be to just run.
In contrast, the interpreter deals with each line in the moment, so all the deciphering and overhead costs of interpreting each line are paid as it runs. These overhead costs in effect make the interpreted program run more slowly than the equivalent compiled program.

Dynamic Language Evaluation

Dynamic languages have two main features. (a) they run more slowly, say 10x as very rough rule of thumb, compared to compiled machine code. (b) languages implemented by interpreters can have significant programmer-friendly features that are very difficult to support with compiled code.
"Memory management" is the problem in a program of knowing, over time, when bytes of RAM are needed and when they can be reclaimed and use for something else. Every program must solve this problem. Memory management is an excellent example of a feature different between compiled and dynamic languages -- all modern dynamic languages manage memory automatically. The programmer can focus on the problem to be solved, and the dynamic language will take care of managing the memory.
In contrast, in C and C++, the programmer must think about memory management at times, and author lines of code to help solve it. (Aside: many crashes in C and C++ programs are due to errors in the programmer's memory management scheme. It is a difficult problem to solve manually.)
The memory management is not free. Dynamic languages, in effect, spend CPU cycles to manage the memory. This fits the general pattern that dynamic languages run with more overhead (i.e. more slowly) than compiled languages.
Because dynamic languages like Java and Python have more features, a programmer can often write the code to solve a problem more quickly in a dynamic language than they can in C++. The time and attention of programmers is generally quite scarce (translation: programmers are scarce and expensive, which is why you want to be a CS major, or at least a minor!). Therefore, dynamic languages which allow the programmer to produce a correct program more quickly and reliably are pretty attractive, even if the resulting program uses more CPU and more RAM. Aside: Moore's law in effect, keeps making the programmer relatively more expensive compared to the CPU.
Overall, different computer languages have different strengths and weaknesses, and best language for a particular problem depends on the situation. As above, dynamic languages like Java and Python can run slower and generally operate with higher overhead than C++ code, so for some problems, writing in C or C++ is the best strategy. Also, Java and Python lack certain "low level access" features which are needed in rare cases.

JIT Just In Time Compiler

The most modern form of dynamic language is implemented with an interpreter paired with a Just In Time compiler (JIT) trying to get the best of both worlds. The JIT looks a sections of dynamic code that are being run very frequently, and for those, does a compile to native code for that section on the fly. So the interpreter is used for simple cases, but for important sections of dynamic code (like the inside of a loop), the JIT creates a block of machine code in RAM for that section. The machine code is run for that section of dynamic code, giving similar performance to C++, and is discarded when the program exits. Java and Javascript both use JIT technology extensively. The great speedup of browsers in the last few years has been largely due to the implemented of JIT technology for Javascript. The JIT erases most but not all of the "10x" penalty. Even with JITs, dynamic languages still have higher overhead use of resources compared to C and C++.

Common Scenarios To Understand

  • Computer boots up -- when first powered up, the hardware runs a tiny "boot loader" program which examines the available hardware and allows the user to select what operating system to start. The operating system starts up, sets up housekeeping, and typically launches a file viewer program to show the user the file systems available.
  • User double clicks on Firefox -- the Firefox icon basically points to a file (actually a group of files) that contain the machine code for Firefox (Firefox is written in C++). The OS sets up a new program environment -- some RAM, a window to draw to -- copies the first section of machine code from the .exe file up to RAM, and starts the CPU running that code.
  • User double clicks on a flowers.jpg file the OS launches a viewer program to display the image stored in that file. The ".jpg" "extension" on the filename suggests what type of file it is. Note however, the extension can definitely be wrong; changing it is as easy as renaming the file. Typically, the operating system keeps associations of what file type to open with what program -- open .jpg files with one program, .pdf files with another, and so on. In this case, the OS would launch an image viewing program, pointing it to the flowers.jpg file as the one to display (that is, display the image whose bytes are stored in that file).

No comments:

Post a Comment