Professional Documents
Culture Documents
The Unix Kiss: A Case Study: Franco Milicchio
The Unix Kiss: A Case Study: Franco Milicchio
Franco Milicchio
Dept. Computer Science and Engineering, University Roma Tre Via della Vasca Navale, 79 00146 Roma Italy milicchio@mac.com
ABSTRACT In this paper we show that the initial philosophy used in designing and developing UNIX in early times has been forgotten due to fast practices. We question the leitmotif that microkernels, though being by design adherent to the KISS principle, have a number of context switches higher than their monolithic counterparts, running a test suite and verify the results with standard statistical validation tests. We advocate a wiser distribution of shared libraries by statistically analyzing the weight of each shared object in a typical UNIX system, showing that the majority of shared libraries exist in a common space for no real evidence of need. Finally we examine the UNIX heritage with an historical point of view, noticing how habits swiftly replaced the intents of the original authors, moving the focus from the earliest purpose of is avoiding complications, keeping a system simple to use and maintain. KEYWORDS UNIX; Statistics; Case Studies; Operating Systems.
1. INTRODUCTION
UNIX is the eldest operating system still in use, having its roots in the 1960s Multics system. Not by chance its original name was Unics, later changed to its renowned denomination. It was designed and developed at the Bell Labs by Thompson, Ritchie and McIlroy, trying to avoid some complications its ancestor introduced, keeping the system small and simple. This philosophy, originated by complex systems engineering, gained fame under the acronym KISS, keep it simple, stupid, and dates back to the 14th century with the lex parsimoniae by the philosopher William of Ockham, who stated entia non sunt multiplicanda praeter necessitatem, best known as Ockham's razor entities should not be multiplied beyond necessity. We can easily recognize that the whole project followed this rule of thumb even from its first version. Small programs were preferred instead of big ones, developing programs that do a single task but very efficiently. To run complex jobs, these small applications could, and still are, connected by I/O redirection. After many years, and after many death prophecies, UNIX is still one of the most used operating systems. A question may arise, whether UNIX has observed in its history the KISS principle, or it has forgot this basic rule and followed other habits. The operating system core is one of the major concerns. Microkernels have been developed during years of research, but the common understanding of them was poor. They were always addressed as neat and simple academic design projects, following the KISS principle, but with bad performances due to the number of context switches necessary to run an application. Even though there is commercial evidence that microkernels are not just academic proofs of concepts, with mission-critical real-time operating systems like QNX, and even end-user UNIX systems like MacOS X, the leitmotif of monolithic kernels being better never faded. Dynamic linking was one of the features inherited by UNIX from its ancestor Multics. Reusing code through shared libraries is a common practice not only in the UNIX world, but in all modern operating systems. This praxis on one hand simplifies the developer's job, but on the other hand can add a high grade of complexity in a system if all softwares do actually share their code. Again, another common opinion is that sharing prevents a system to be overwhelmed by an ungovernable duplication of resources. However, this belief had never been proved right or wrong.
In this paper we address these issues from an historical and statistical point of view. We will evaluate the number of context switches with a set of tests, validating the results with state of the art statistical analyses. On the libraries side, we will take a survey of all shared resources present on our systems, inspecting the weight and impact of such libraries to corroborate or contradict the habit of sharing them. Finally, we will point out the historical heritage of UNIX, showing how the KISS philosophy had become less important than common habits. For our survey we chose Linux and MacOS X. This decision was taken to avoid biased results for any statistical analysis by choosing a niche OS. Thus, by selecting two UNIX (or UNIX-like) operating systems available to a general public, we achieve an impartial and fair comparison.
2. KERNEL WARS
A kernel is the core of an operating system, giving the minimum abstraction layer for hardware handling, inter-process communication and memory management. Along with these, a kernel may provide other services like device management, sound and network control. On monolithic systems all services are implemented in kernel space, while on microkernels all these aspects are delegated to userland servers. A monolithic kernel is thus less adherent to the KISS principle, involving a high grade of complexity, evident even by the needed number of lines of code. By construction they have tight and often non-trivial dependencies between their components, affecting the whole system in case of bugs. Microkernels on the other hand tend to keep all aspects simple and neat, commissioning all services to the servers. This quality affects microkernels on the design side by needing a great care in planning their features. In the history of operating systems the debate between the supporters of microkernel design, opposed to the monolithic approach, was one of the major discussion arguments. The most famous flaming discussion on the topic started between Andrew Tanenbaum and Linus Torvalds, the creators of Minix and Linux respectively. Much of the discussion between these two major schools is about performance. A monolithic kernel should be more efficient than a microkernel because it requires less context switches for all tasks. This fact had always been addressed but never sufficiently investigated or statistically proved. We analyzed both MacOS X and Linux with a suite of tests to prove or confute this efficiency issue. All the tests have been then evaluated in their statistical significance to validate the results.
All the tests have been proved being different in means and variances with statistical significance, having a p-value not exceeding the standard statistical limit of 0.05. In only one test, the creation of a 10 MB file from /dev/random has a higher p-value of 0.09, signifying that the two tests have no significant difference: an analysis of the data showed the presence of an outlier in the Linux system with a total context switches count of 1755. Applying an outlier filtering the test showed a p-value < 10-6. Additionally a Sign test (Abdi, 2006) was performed in order to compare the performances in the overall contexts. In this case a positive match was given to MacOS X if the 1-tailored p-value of the t-Test was less than the standard threshold of 0.05. The sign test resulted in the probability of the two series not being statistically different of p < 2.9410-3.
3. SHARED LIBRARIES
A common conduct in the UNIX world is to subdivide a complex program into small pieces, usually libraries. This approach is obviously conforming to the KISS principle, and of course is not limited to the UNIX world. During years of life, this procedure of using shared libraries have become so commonly spread, that there is the ongoing belief that a software using a shared library should install it system-wide, so that other softwares could use it as well. Another opinion is that using system-wide shared libraries would decrease the amount of disk space wasted by an uncontrollable duplication of resources. The consequence is that on a typical UNIX system we cannot recognize immediately who is a user of a library, and worse, if there are any. In the next section we analyze the shared library distribution and relative weight on a UNIX system, determining if the opinions about sharing system-wide libraries have a real foundation.
Count 3294 3291 1879 1871 1561 989 804 781 759 749
MacOS X Library
libSystem libiconv libgcc_s CoreFoundation libncurses libcups libsasl2 libssl DirectoryService Kerberos
We strongly stress the fact that we chose one of the cleanest and most coherent Linux distribution, having just one and only one desktop environment and so not adding other shared libraries for just a single application. Other UNIX systems that include more environments, such as Solaris that includes both CDE and JDE, or AIX including KDE as well as Gnome, have worse results. On MacOS X the number of shared libraries are far less than those present on other systems: since all applications bundle all their resources, there is no library installed system-wide by applications with few obvious exceptions (e.g. device drivers, kernel extensions).
Figure 1. Number of libraries per number of linkers
4. UNIX HERITAGE
As UNIX was developed, it followed the KISS principle in almost every aspect, although it was not a requirement. The everything is a file philosophy, which is characteristic to every UNIX system, was present from the very beginning. The first UNIX system already contained the dev directory with the special device files. This abstractive approach to devices, files and directories is clearly KISS-compliant, as it pursues an extreme simplicity and coherence in handling files, directories, devices and even IPC-related files with a simple API. At its birth the directory structure on UNIX was also very simple, as we can see in (Thompson and Ritchie, 1971). In the first UNIX there were just few directories, bin, etc, and usr. The first two were integral part of the operating system, being respectively the place where the system binaries were stored, and where other things regarding the system were to be found (e.g. system libraries, configuration files). The usr directory was the place where users had their own personal space, thus properly the users directory. Reading the manual we can have the clear perspective of the author's intention: the clear distinction of roles between system and users, as the authors themselves say user-maintained programs are not considered part of the UNIX system, referring to the Section 6 of the UNIX manual.
As UNIX grew, there were more additions to the system. A library directory was added moving the system-shared objects from the etc to a more meaningful location. The system administration services were also moved to another directory called sbin from their original location etc. We may notice as this choice has nothing to do with a privileged access to system binaries, but they were in etc to lessen the probability of its being invoked by accident or from curiosity, as we can read in the boot command in (Thompson and Ritchie, 1971). This fact still holds in these days, being by default sbin not present in the path environment variable on many UNIX systems, but still accessible by users. We may recall that in the days when UNIX was conceived, many directories were mount points for a storage mediums, making the swap between tapes easier to handle. The ongoing growth, and of course the lack of standardization in the early UNIX systems, led to the creation of a plethora of directories and mount points. While the habit of using many locations as mount points for different storage mediums was a necessary procedure in old times, nowadays had become just a habit: the sbin directory for example is there because we expect it to be there.
Figure 2. Library occupancy if statically linked (per number of linkers)
The usr directory is one of the notable examples of how a system can grow in complexity. From a user data storage location, this directory have been in years called with many acronyms, like User Shared Resources or UNIX System Resources, becoming full of locations with an unclear meaning. Originally a user directory was a shared resource, as we can see for example in the description of the cal program in the original UNIX manual. Over the years this location had been more and more used to store system binaries, like applications, graphic servers and UNIX commands. Apart from libraries, which have been analyzed in the previous section, this shared directory contains application resources such as translations, documentations, configurations, and even administrative commands. In fact, we may ask what is now the difference between an administrative command stored in sbin from another one stored in usr/sbin. We can track the reason back to history, but none of the causes that originally drove this conduct hold today. Focusing on the applications, we can ask ourselves why a translation file for particular software should reside in a directory different from the application itself. Moreover the question is whether or not these files can be classified as truly shared: by their very definition those resources are not certainly shared by any other software. Moreover, we find that almost any UNIX system contains a directory dedicated to the system
header files. Although it might at first seem convincing, this habit is comparable to the practice of separating application resources from the application itself. A header by itself has almost no usage without the library it describes, so by their purpose they should not be stored in different places. As for applications, storing all the available header files in a single location does not help in keeping a resource simple and immediately recognizable. Bundling headers and respective binary library in a single location is again a possible solution, avoiding the spread of files in many directories and increasing the system simplicity. Again a NeXT approach to software deployment gives a solution to this unreasonable complexity, which keeps a system not certainly simple and stupid to understand and maintain. We address NeXT in particular because it was a UNIX operating system, but bundles were actually not limited to the NeXT OS: for example BeOS applications were bundles even though it was not a UNIX. Bundling all the application-related files in a single location makes it simple to recognize an application resource from a system one. In the modern NeXT descendant, MacOS X, we can clearly see an effort in simplifying the system, by using application and library bundles. Moreover, it introduced locations with significant names like Applications, System, Library, and re-establishing the Users directory. Despite the efforts, the system still has common UNIX directories, which of course could have been easily avoided retaining a compatibility with the past.
5. CONCLUSION
We have analyzed some of the main concerns about the adherence to the KISS principle by two of the most used UNIX operating systems available to a general public, MacOS X and Linux. We have proved, with statistical evidence and validation, that a microkernel, complying with the KISS principle by design, has no more context switches than a classic monolithic kernel, thus negating the opposition to the first family of operating system cores. This also proves that if there is a performance difference between the two, is not due to the number of context switches. In addition, we examined the current shared library situation, showing that a simplification process is needed to satisfy the simplicity of maintenance that modern systems require. Moreover, the space required by strictly limiting the number of libraries not only have an insignificant impact on modern storage media, but reduce the number of shared resources at least by 50%. Born following the KISS principle, UNIX had become a huge and habit-prone system. Vestigial heritages are still present, as in the MacOS X system, but have no reason to exist anymore. The KISS principle was of course present even in the first version of Bell Labs UNIX, but evidently it was swiftly replaced by habits that still in our times taint the simplicity and logic of the original intents.
ACKNOWLEDGEMENT
A brief acknowledgement.
REFERENCES
Abdi, H., 2006, Binomial Distribution: Binomial and Sign Tests. Encyclopedia of Measurement and Statistics. Neil J. Salkind Editor, Sage Publications, Inc. Casella, G. and Berger, R. L., 2001. Statistical Inference. Duxbury Press, Duxbury Advanced Series, USA. Freedman, D., 2005. Statistical Models: Theory and Practice. Cambridge University Press, New York, NY, USA. Hastie, T. et al., 2003. The Elements of Statistical Learning. Springer, New York, NY, USA. IEEE, 2003. Standard for Information Technology-Standardization Application Environment Profile-POSIX Realtime and Embedded Application Support (AEP). Institute of Electrical and Electronics Engineers, STD 1003.13-2003. Ritchie, D. M. and Thompson, K., 1983. The UNIX time-sharing system. Communications of the ACM, Vol. 26, No. 1, pp 84-89. Salus, P. H., 1994. A Quarter Century of UNIX. Addison-Wesley Professional, Boston, MA, USA. Thompson, K. and Ritchie, D. M., 1971, UNIX Programmer's Manual. Bell Laboratories.