Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

Research Tools

People across a diverse range of science, engineering, and business-related disciplines


spend their workdays writing, executing, debugging, and interpreting the outputs of
computer programs.
Research Programming
Preparation Phase
The core of research programming is the analysis phase: writing, executing, and
refining computer programs to analyze and obtain insights from data
Research programmers often prefer to use interpreted \scripting" languages such
as Python, Perl, R, and MATLAB etc. However, they also use compiled languages
such as C, C++, and Fortran when appropriate.
Such programming tools helps to solve the problems in time but there are some
programming issues which can showdown the process
File and metadata management is another challenge in the analysis phase.
Repeatedly editing and executing scripts while iterating on experiments causes the
production of numerous output les, such as intermediate data, textual reports,
tables, and graphical visualizations.
Research programmers alternate between the analysis and reflection
phases while they work.
Whereas the analysis phase involves programming, the reflection
phase involves thinking and communicating about the outputs of
analyses.
After inspecting a set of output les, a researcher might perform the
following types of reflection:
Take notes: People take notes throughout their experiments in
both physical and digital formats.
Hold meetings: People meet with colleagues to discuss results and
to plan next steps in their analyses.
Make comparisons and explore alternatives: The reflection
activities that tie most closely with the analysis phase are making
comparisons between output variants and then exploring
alternatives by adjusting script code and/or execution parameters.
The final phase of research programming is disseminating results.
Most commonly in the form of written reports such as internal memos,
slideshow presentations, business/policy white papers, or academic
research publications.
The main challenge here is how to consolidate all of the various notes,
freehand sketches, emails, scripts, and output data les created throughout
an experiment to aid in writing.
Beyond presenting results in written form, some researchers also want to
distribute their software so that colleagues can reproduce their experiments
or play with their prototype systems.
Addressing Research Programming Challenges
Decades of research and production of tools such as optimizing compilers,
high-level programming languages, integrated development environments,
debuggers, auto-mated software bug finders, and testing frameworks have
made the process of writing code far more pleasant and productive than it
was in the early days of computing.

Since research programming is a form of programming, modern-day tools


that help all programmers also help research programmers.

However, research programmers often do not take full advantage of these


tools since the target audience of these tools is people engaging in software
engineering, an activity with vastly different characteristics than research
programming. Here are the most salient differences:
Purpose: The main goal of research programming is to obtain insights about a
topic; code.
Environment: Research programmers often work in a heterogeneous
environment where they cobble together a patchwork of improvised ad-hoc
scripts and \sloppy" prototype code written in multiple languages, inter-facing
with a mix of 3rd-party libraries and executables from disparate sources within a
UNIX-like command-line environment.
Heterogeneity is a fundamental property of research programming.
Specifications: The process of writing research code is exploratory and highly
iterative, where specifications are ill-defined and constantly-changing in
response to new findings.
Priorities: Tools to facilitate research programming must be \lightweight"
enough to be useful without causing delays in the programmer's workflow.
Expertise: Research code is written by people of all levels of programming
expertise, ranging from Computer Science veterans to scientists who learn barely
enough about programming to be able to write basic scripts.
A large amount of research programming is being done on data sets less
than a terabyte in size, which fit on one single modern desktop machine.

Raw data sets usually come in the form of at les in either textual or binary
formats.

After data cleaning, reformatting, and integration, programmers might opt to


store post-processed data in databases of either the relational (e.g., SQLite,
MySQL, Post-greSQL) or NoSQL etc.

Most databases offer indexing and materialized view features that can speed
up computations. NoSQL databases offer greater schema flexibility but
sometimes at the expense of a lack of full ACID (atomicity, consistency,
isolation, durability) guarantees.
A variety of programming languages and run-time enhancements aim to speed up the
iteration cycle during the analysis phase of research programming.

Just-in-time compilers for dynamic languages (e.g., PyPy for Python,


TraceMonkey for JavaScript) can speed up script execution times without
requiring programmers to make any annotations.

However, JIT compilers only focus on micro-optimizations of CPU-bound


code such as hot inner loops.

Parallel execution of code can vastly speed up scientific data processing


scripts, at the cost of increased difficulty in programming and debugging
such scripts.

Speeding Up Incremental Running Times: It involves manually rewriting a


function to save its inputs and outputs to a cache, so that subsequent calls
with previously-seen inputs can be skipped, thus speeding up incremental
running times.
Preventing Crashes From Errors
Silent errors in programming languages: Some programming languages, most
notably Perl, are designed to silence errors as much as possible to avoid crashing
scripts. For example, Perl and PHP automatically convert between strings and
integers rather than throwing a run-time type error.

Failure-oblivious computing is a technique that silently hides memory access errors


in C programs by ignoring out-of-bounds writes and returning fake (small integer)
values for out-of-bounds reads.

Since failure-oblivious computing works on C code, it requires re-compiling the


target program and incurs a slowdown due to memory bounds checking.

An ideal run-time environment for data analysis should both prevent crashes and
also flag errors to aid in debugging rather than silently hiding them.

Error tolerance in cluster data processing: Google's MapReduce and the open-
source Hadoop frameworks both have a mode that skips over bad records when
processing data on compute clusters.
Domain-Specific Programming
Environments
Researchers have developed a variety of domain-specific
environments catered towards specific types of research
programming tasks.

Evaluating machine learning algorithms.


Visual programming tool
Graphical development environments for designing and executing
scientific computations.
Operating Systems

Operating system refers to the collection of software's that manages hardware


resources of a computer and provides collective services to the user.
Different types of Computer Operating Systems refer to the collection of various
software's.
Every computer possesses an operating system to run other programs present in it.
Nowadays operating system has become very popular as it can be found on several
devices ranging from personal computers to cell phones, particularly the smart
phones. For example, almost every smart phone uses newest android operating
system.
Any operating system performs some basic tasks like recognizing the input data
from a keyboard, sending output to the display screen, keeping files and
directories of the disk and controlling the peripheral devices like printers.
An operating system can perform a single task or operation as well as multiple
tasks or operations at any time.
What OS does?
An operating system performs basic tasks such as,

controlling and allocating memory,


prioritizing system requests,
controlling input and output devices,
facilitating networking and
managing file systems.
Structure of Operating System:
Application Programs

System Programs

Software (Operating System)

HARDWARE
Structure of Operating System
The structure of OS consists of 4 layers:
1. Hardware
Hardware consists of CPU, Main memory, I/O Devices, etc,

2. Software (Operating System)


Software includes process management routines, memory management
routines, I/O control routines, file management routines.

3. System programs
This layer consists of compilers, Assemblers, linker etc.

4. Application programs
This is dependent on users need. Ex. Railway reservation system, Bank
database management etc.,
Evolution of Shared Computing
Batch processing
Interactive processing
Requires real-time processing
Time-sharing/Multitasking
Implemented by Multiprogramming
Multiprocessor machines
Batch Processing:
In Batch processing same type of jobs batch (BATCH- a set of
jobs with similar needs) together and execute at a time.

The OS was simple, its major task was to transfer control from
one job to the next.

The job was submitted to the computer operator in form of


punch cards. At some later time the output appeared.

The OS was always resident in memory. (Ref. Fig. next slide)


Common Input devices were card readers and tape drives.
Batch Processing (Contd):
Common output devices were line printers, tape drives, and
card punches.
Users did not interact directly with the computer systems, but
he prepared a job (comprising of the program, the data, &
some control information).
OS

User
program
area
Multiprogramming:
Multiprogramming is a technique to execute number of programs
simultaneously by a single processor.

In Multiprogramming, number of processes reside in main memory


at a time.

The OS picks and begins to executes one of the jobs in the main
memory.

If any I/O wait happened in a process, then CPU switches from that
job to another job.

Hence CPU in not idle at any time.


Multiprogramming (Contd):
OS Figure depicts the layout of
multiprogramming system.
Job 1
The main memory consists of 5
Job 2 jobs at a time, the CPU executes
one by one.
Job 3 Advantages:
Job 4 Efficient memory utilization

Job 5 Throughput increases


CPU is never idle, so performance
increases.
Time Sharing Systems:
Time sharing, or multitasking, is a logical extension of
multiprogramming.
Multiple jobs are executed by switching the CPU between
them.
In this, the CPU time is shared by different processes, so it is
called as Time sharing Systems.
Time slice is defined by the OS, for sharing CPU time between
processes.
Examples: Multics, Unix, etc.,
Operating Systems functions:
The main functions of operating systems are:

1. Program creation
2. Program execution
3. Input/Output operations
4. Error detection
5. Resource allocation
6. Accounting
7. Protection
Types of OS:
Operating System can also be classified as,-

Single User Systems

Multi User Systems


Single User Systems:

Provides a platform for only one user at a


time.

They are popularly associated with Desk Top


operating system which run on standalone
systems where no user accounts are required.
Example: DOS
Multi-User Systems:

Provides regulated access for a number of users by


maintaining a database of known users.

Refers to computer systems that support two or


more simultaneous users.

Another term for multi-user is time sharing.

Ex: All mainframes are multi-user systems.


Example: Unix
Operating System Components
Shell: Communicates with users
Text based
Graphical user interface (GUI)
Kernel: Performs basic required functions
File manager
Device drivers
Memory manager
Process manager (Scheduler, dispatcher, etc..)
The shell as an interface between
users and the operating system
File Manager
Role coordinate the use of machines mass
storage facilities
Hierarchical organization
Directory (or Folder): A user-created bundle of
files and other directories (subdirectories)
Directory Path: A sequence of directories within
directories
Access/operations to files is provided by file
manager via a file descriptor
Device Manager
Part of OS presented as a collection of device drivers
specialized software that communicate with the
controllers to carry out operations on peripheral
devices connected to the computer
Each driver is specifically designed for its type of
device (e.g. printer, monitor, etc..) and translates
generic requests into device specific sequence of
operations
Memory Manager
Has the task of coordinating the use of main memory
allocates/de-allocates space in main memory
When the total required memory space exceeds the
physical available space.
May create the illusion that the machine has more
memory than it actually does (virtual memory) by playing
a shell game in which blocks of data (pages) are shifted
back and forth between main memory and mass storage
Processes
Process: The activity of executing a program (NOT
THE SAME THING AS A PROGRAM!!!)
Program static set of directions (instructions)
Process dynamic entity whose properties change as time
progresses. It is an instance in execution of a program.
Process State: Current status of the activity
Program counter
General purpose registers
Related portion of main memory
Process Manager
Scheduler the part of kernel in charge with the
strategy for allocation/de-allocation of the CPU to
each competing process
Maintains a record of all processes in the OS (via a process
table), introduces new processes to this pool and removes
the ones that completed
Dispatcher is the component of the kernel that
overseas the execution of the scheduled processes
Achieved by multiprogramming
Scheduler
Scheduler: Adds new processes to the process
table and removes completed processes from
the process table
Process table contains
Memory area assigned to the process
Priority of the process
State of the process (ready or waiting)
Dispatcher
Dispatcher: Controls the allocation of CPU (of time
slices) to the processes in the process table
The end of a time slice is signaled by an interrupt.
Each process is allowed to execute for one time slice
It performs process switch procedure to change
from one process to another
Process A Dispatcher Process B
Time-sharing between process A and
process B

You might also like