Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

I/O Streams

An I/O Stream represents an input source or an output destination. A stream can represent many different
kinds of sources and destinations, including disk files, devices, other programs, and memory arrays.
Streams support many different kinds of data, including simple bytes, primitive data types, localized
characters, and objects. Some streams simply pass on data; others manipulate and transform the data in
useful ways.
No matter how they work internally, all streams present the same simple model to programs that use
them: A stream is a sequence of data. A program uses an input stream to read data from a source, one
item at a time:

Byte Streams
Programs use byte streams to perform input and output of 8-bit bytes. All byte stream classes are
descended from InputStream and OutputStream.
There are many byte stream classes. To demonstrate how byte streams work, we'll focus on the file I/O
byte streams, FileInputStream and FileOutputStream. Other kinds of byte streams are used in
much the same way; they differ mainly in the way they are constructed.

Character Streams
The Java platform stores character values using Unicode conventions. Character stream I/O
automatically translates this internal format to and from the local character set. In Western locales, the
local character set is usually an 8-bit superset of ASCII.
For most applications, I/O with character streams is no more complicated than I/O with byte streams.
Input and output done with stream classes automatically translates to and from the local character set. A
program that uses character streams in place of byte streams automatically adapts to the local character
set and is ready for internationalization all without extra effort by the programmer.
If internationalization isn't a priority, you can simply use the character stream classes without paying much
attention to character set issues. Later, if internationalization becomes a priority, your program can be
adapted without extensive recoding. See the Internationalization trail for more information.

Buffered Streams
Most of the examples we've seen so far use unbuffered I/O. This means each read or write request is
handled directly by the underlying OS. This can make a program much less efficient, since each such
request often triggers disk access, network activity, or some other operation that is relatively expensive.
To reduce this kind of overhead, the Java platform implements buffered I/O streams. Buffered input
streams read data from a memory area known as a buffer; the native input API is called only when the

buffer is empty. Similarly, buffered output streams write data to a buffer, and the native output API is called
only when the buffer is full.
A program can convert an unbuffered stream into a buffered stream using the wrapping idiom we've used
several times now, where the unbuffered stream object is passed to the constructor for a buffered stream
class. Here's how you might modify the constructor invocations in the CopyCharacters example to use
buffered I/O:
inputStream = new BufferedReader(new FileReader("xanadu.txt"));
outputStream = new BufferedWriter(new FileWriter("characteroutput.txt"));
There are four buffered stream classes used to wrap unbuffered
streams: BufferedInputStream and BufferedOutputStream create buffered byte streams,
while BufferedReaderand BufferedWriter create buffered character streams.

Scanning and Formatting


Programming I/O often involves translating to and from the neatly formatted data humans like to work
with. To assist you with these chores, the Java platform provides two APIs. The scannerAPI breaks input
into individual tokens associated with bits of data. The formatting API assembles data into nicely
formatted, human-readable form.

Scanning
Objects of type Scanner are useful for breaking down formatted input into tokens and translating
individual tokens according to their data type.

Formatting
Stream objects that implement formatting are instances of either PrintWriter, a character stream
class, or PrintStream, a byte stream class.

Note: The only PrintStream objects you are likely to need are System.out and System.err.
(See I/O from the Command Line for more on these objects.) When you need to create a formatted output
stream, instantiate PrintWriter, not PrintStream.

Like all byte and character stream objects, instances of PrintStream and PrintWriter implement a
standard set of write methods for simple byte and character output. In addition,
bothPrintStream and PrintWriter implement the same set of methods for converting internal data
into formatted output. Two levels of formatting are provided:

print and println format individual values in a standard way.

format formats almost any number of values based on a format string, with many options for
precise formatting.

I/O from the Command Line


A program is often run from the command line and interacts with the user in the command line
environment. The Java platform supports this kind of interaction in two ways: through the Standard
Streams and through the Console.

Data Streams
Data streams support binary I/O of primitive data type values
(boolean, char, byte, short, int, long, float, and double) as well as String values. All data
streams implement either the DataInput interface or the DataOutput interface. This section focuses
on the most widely-used implementations of these
interfaces, DataInputStream and DataOutputStream.

Object Streams
Just as data streams support I/O of primitive data types, object streams support I/O of objects. Most, but
not all, standard classes support serialization of their objects. Those that do implement the marker
interface Serializable.
The object stream classes are ObjectInputStream and ObjectOutputStream. These classes
implement ObjectInput and ObjectOutput, which are subinterfaces
of DataInputand DataOutput. That means that all the primitive data I/O methods covered in Data
Streams are also implemented in object streams. So an object stream can contain a mixture of primitive
and object values. The ObjectStreams example illustrates this. ObjectStreams creates the same
application as DataStreams, with a couple of changes. First, prices are nowBigDecimalobjects, to
better represent fractional values. Second, a Calendar object is written to the data file, indicating an
invoice date.
If readObject() doesn't return the object type expected, attempting to cast it to the correct type may
throw a ClassNotFoundException. In this simple example, that can't happen, so we don't try to catch
the exception. Instead, we notify the compiler that we're aware of the issue by
adding ClassNotFoundException to the main method's throws clause.

File I/O (Featuring NIO.2)

Note: This tutorial reflects the file I/O mechanism introduced in the JDK 7 release. The Java SE 6 version
of the File I/O tutorial was brief, but you can download the Java SE Tutorial 2008-03-14 version of
the tutorial which contains the earlier File I/O content.

The java.nio.file package and its related package, java.nio.file.attribute, provide


comprehensive support for file I/O and for accessing the default file system. Though the API has many
classes, you need to focus on only a few entry points. You will see that this API is very intuitive and easy
to use.
The tutorial starts by asking what is a path? Then, the Path class, the primary entry point for the
package, is introduced. Methods in the Path class relating to syntactic operations are explained. The
tutorial then moves on to the other primary class in the package, the Files class, which contains
methods that deal with file operations. First, some concepts common to many file operations are
introduced. The tutorial then covers methods for checking, deleting, copying, and moving files.
The tutorial shows how metadata is managed, before moving on to file I/O and directory
I/O. Random access files are explained and issues specific to symbolic and hard links are
examined.
Next, some of the very powerful, but more advanced, topics are covered. First, the capability
to recursively walk the file tree is demonstrated, followed by information about how to search
files using wild cards. Next, how to watch a directory for changes is explained and
demonstrated. Then, methods that didn't fit elsewhere are given some attention.

for

Finally, if you have file I/O code written prior to the Java SE 7 release, there is a map from the old API
to the new API, as well as important information about the File.toPath method for developers who
would like to leverage the new API without rewriting existing code .

What Is a Path? (And Other File System Facts)


A file system stores and organizes files on some form of media, generally one or more hard drives, in
such a way that they can be easily retrieved. Most file systems in use today store the files in a tree
(or hierarchical) structure. At the top of the tree is one (or more) root nodes. Under the root node, there
are files and directories (folders in Microsoft Windows). Each directory can contain files and
subdirectories, which in turn can contain files and subdirectories, and so on, potentially to an almost
limitless depth.

Path Operations
The Path class includes various methods that can be used to obtain information about the path, access
elements of the path, convert the path to other forms, or extract portions of a path. There are also
methods for matching the path string and methods for removing redundancies in a path. This lesson
addresses these Path methods, sometimes called syntactic operations, because they operate on the
path itself and don't access the file system.

File Operations
The Files class is the other primary entrypoint of the java.nio.file package. This class offers a rich
set of static methods for reading, writing, and manipulating files and directories. TheFiles methods work
on instances of Path objects. Before proceeding to the remaining sections, you should familiarize
yourself with the following common concepts:

Deleting a File or Directory


You can delete files, directories or links. With symbolic links, the link is deleted and not the target of the
link. With directories, the directory must be empty, or the deletion fails.
The Files class provides two deletion methods.
The delete(Path) method deletes the file or throws an exception if the deletion fails. For example, if
the file does not exist a NoSuchFileException is thrown. You can catch the exception to determine
why the delete failed as follows:

Copying a File or Directory


You can copy a file or directory by using the copy(Path,Path,CopyOption...) method. The copy
fails if the target file exists, unless the REPLACE_EXISTING option is specified.
Directories can be copied. However, files inside the directory are not copied, so the new directory is
empty even when the original directory contains files.
When copying a symbolic link, the target of the link is copied. If you want to copy the link itself, and not
the contents of the link, specify either the NOFOLLOW_LINKS or REPLACE_EXISTINGoption.
This method takes a varargs argument. The following StandardCopyOption and LinkOption enums
are supported:

REPLACE_EXISTING Performs the copy even when the target file already exists. If the
target is a symbolic link, the link itself is copied (and not the target of the link). If the target is a
non-empty directory, the copy fails with the FileAlreadyExistsException exception.

COPY_ATTRIBUTES Copies the file attributes associated with the file to the target file. The
exact file attributes supported are file system and platform dependent, but lastmodified
time is supported across platforms and is copied to the target file.

NOFOLLOW_LINKS Indicates that symbolic links should not be followed. If the file to be
copied is a symbolic link,

Moving a File or Directory


You can move a file or directory by using the move(Path,Path,CopyOption...) method. The
move fails if the target file exists, unless the REPLACE_EXISTING option is specified.
Empty directories can be moved. If the directory is not empty, the move is allowed when the directory can
be moved without moving the contents of that directory. On UNIX systems, moving a directory within the
same partition generally consists of renaming the directory. In that situation, this method works even
when the directory contains files.

Networking Basics

Computers running on the Internet communicate to each other using either the Transmission Control
Protocol (TCP) or the User Datagram Protocol (UDP), as this diagram illustrates:

When you write Java programs that communicate over the network, you are programming at the
application layer. Typically, you don't need to concern yourself with the TCP and UDP layers. Instead, you
can use the classes in the java.net package. These classes provide system-independent network
communication. However, to decide which Java classes your programs should use, you do need to
understand how TCP and UDP differ.

TCP
When two applications want to communicate to each other reliably, they establish a connection and send
data back and forth over that connection. This is analogous to making a telephone call. If you want to
speak to Aunt Beatrice in Kentucky, a connection is established when you dial her phone number and she
answers. You send data back and forth over the connection by speaking to one another over the phone
lines. Like the phone company, TCP guarantees that data sent from one end of the connection actually
gets to the other end and in the same order it was sent. Otherwise, an error is reported.
TCP provides a point-to-point channel for applications that require reliable communications. The
Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), and Telnet are all examples of
applications that require a reliable communication channel. The order in which the data is sent and
received over the network is critical to the success of these applications. When HTTP is used to read from
a URL, the data must be received in the order in which it was sent. Otherwise, you end up with a jumbled
HTML file, a corrupt zip file, or some other invalid information.

Definition:
TCP (Transmission Control Protocol) is a connection-based protocol that provides a reliable flow of data
between two computers.

UDP
The UDP protocol provides for communication that is not guaranteed between two applications on the
network. UDP is not connection-based like TCP. Rather, it sends independent packets of data,
called datagrams, from one application to another. Sending datagrams is much like sending a letter

through the postal service: The order of delivery is not important and is not guaranteed, and each
message is independent of any other.

Definition:
UDP (User Datagram Protocol) is a protocol that sends independent packets of data, called datagrams,
from one computer to another with no guarantees about arrival. UDP is not connection-based like TCP.

For many applications, the guarantee of reliability is critical to the success of the transfer of information
from one end of the connection to the other. However, other forms of communication don't require such
strict standards. In fact, they may be slowed down by the extra overhead or the reliable connection may
invalidate the service altogether.
Consider, for example, a clock server that sends the current time to its client when requested to do so. If
the client misses a packet, it doesn't really make sense to resend it because the time will be incorrect
when the client receives it on the second try. If the client makes two requests and receives packets from
the server out of order, it doesn't really matter because the client can figure out that the packets are out of
order and make another request. The reliability of TCP is unnecessary in this instance because it causes
performance degradation and may hinder the usefulness of the service.
Another example of a service that doesn't need the guarantee of a reliable channel is the ping command.
The purpose of the ping command is to test the communication between two programs over the network.
In fact, ping needs to know about dropped or out-of-order packets to determine how good or bad the
connection is. A reliable channel would invalidate this service altogether.
The UDP protocol provides for communication that is not guaranteed between two applications on the
network. UDP is not connection-based like TCP. Rather, it sends independent packets of data from one
application to another. Sending datagrams is much like sending a letter through the mail service: The
order of delivery is not important and is not guaranteed, and each message is independent of any others.

Note:
Many firewalls and routers have been configured not to allow UDP packets. If you're having trouble
connecting to a service outside your firewall, or if clients are having trouble connecting to your service,
ask your system administrator if UDP is permitted.

Understanding Ports
Generally speaking, a computer has a single physical connection to the network. All data destined for a
particular computer arrives through that connection. However, the data may be intended for different
applications running on the computer. So how does the computer know to which application to forward the
data? Through the use of ports.

Data transmitted over the Internet is accompanied by addressing information that identifies the computer
and the port for which it is destined. The computer is identified by its 32-bit IP address, which IP uses to
deliver data to the right computer on the network. Ports are identified by a 16-bit number, which TCP and
UDP use to deliver the data to the right application.
In connection-based communication such as TCP, a server application binds a socket to a specific port
number. This has the effect of registering the server with the system to receive all data destined for that
port. A client can then rendezvous with the server at the server's port, as illustrated here:

Definition:
The TCP and UDP protocols use ports to map incoming data to a particular process running on a
computer.

In datagram-based communication such as UDP, the datagram packet contains the port number of its
destination and UDP routes the packet to the appropriate application, as illustrated in this figure:

Port numbers range from 0 to 65,535 because ports are represented by 16-bit numbers. The port
numbers ranging from 0 - 1023 are restricted; they are reserved for use by well-known services such as
HTTP and FTP and other system services. These ports are called well-known ports. Your applications
should not attempt to bind to them.

Networking Classes in the JDK


Through the classes in java.net, Java programs can use TCP or UDP to communicate over the
Internet. The URL, URLConnection, Socket, and ServerSocket classes all use TCP to communicate
over the network. The DatagramPacket, DatagramSocket, and MulticastSocket classes are for
use with UDP.

What Is a URL?
A URL takes the form of a string that describes how to find a resource on the Internet. URLs have two
main components: the protocol needed to access the resource and the location of the resource.

Creating a URL
Within your Java programs, you can create a URL object that represents a URL address. The URL object
always refers to an absolute URL but can be constructed from an absolute URL, a relative URL, or from
URL components.

Parsing a URL
Gone are the days of parsing a URL to find out the host name, filename, and other information. With a
valid URL object you can call any of its accessor methods to get all of that information from the URL
without doing any string parsing!

Reading Directly from a URL


This section shows how your Java programs can read from a URL using the openStream() method.

Connecting to a URL
If you want to do more than just read from a URL, you can connect to it by
calling openConnection() on the URL. The openConnection() method returns a URLConnection
object that you can use for more general communications with the URL, such as reading from it, writing to
it, or querying it for content and other information.

Reading from and Writing to a URLConnection


Some URLs, such as many that are connected to cgi-bin scripts, allow you to (or even require you to)
write information to the URL. For example, a search script may require detailed query data to be written to
the URL before the search can be performed. This section shows you how to write to a URL and how to
get results back.

Lesson: All About Sockets


URLs and URLConnections provide a relatively high-level mechanism for accessing resources on the
Internet. Sometimes your programs require lower-level network communication, for example, when you
want to write a client-server application.
In client-server applications, the server provides some service, such as processing database queries or
sending out current stock prices. The client uses the service provided by the server, either displaying
database query results to the user or making stock purchase recommendations to an investor. The
communication that occurs between the client and the server must be reliable. That is, no data can be
dropped and it must arrive on the client side in the same order in which the server sent it.
TCP provides a reliable, point-to-point communication channel that client-server applications on the
Internet use to communicate with each other. To communicate over TCP, a client program and a server
program establish a connection to one another. Each program binds a socket to its end of the connection.
To communicate, the client and the server each reads from and writes to the socket bound to the
connection.

What Is a Socket?
A socket is one end-point of a two-way communication link between two programs running on the
network. Socket classes are used to represent the connection between a client program and a server
program. The java.net package provides two classes--Socket and ServerSocket--that implement the client
side of the connection and the server side of the connection, respectively.

Reading from and Writing to a Socket


This page contains a small example that illustrates how a client program can read from and write to a
socket.

Writing a Client/Server Pair


The previous page showed an example of how to write a client program that interacts with an existing
server via a Socket object. This page shows you how to write a program that implements the other side of
the connection--a server program.

Lesson: All About Datagrams


Some applications that you write to communicate over the network will not require the reliable, point-topoint channel provided by TCP. Rather, your applications might benefit from a mode of communication
that delivers independent packages of information whose arrival and order of arrival are not guaranteed.
The UDP protocol provides a mode of network communication whereby applications send packets of
data, called datagrams, to one another. A datagram is an independent, self-contained message sent over
the network whose arrival, arrival time, and content are not guaranteed.
The DatagramPacket and DatagramSocket classes in the java.net package implement systemindependent datagram communication using UDP.

What Is a Datagram?
A datagram is an independent, self-contained message sent over the network whose arrival, arrival time,
and content are not guaranteed.

Writing a Datagram Client and Server


This section walks you through an example that contains two Java programs that use datagrams to
communicate. The server side is a quote server that listens to its DatagramSocket and sends a
quotation to a client whenever the client requests it. The client side is a simple program that simply makes
a request of the server.

Broadcasting to Multiple Recipients


This section modifies the quote server so that instead of sending a quotation to a single client upon
request, the quote server broadcasts a quote every minute to as many clients as are listening. The client
program must be modified accordingly.

What Is a Network Interface?


A network interface is the point of interconnection between a computer and a private or public network. A
network interface is generally a network interface card (NIC), but does not have to have a physical form.
Instead, the network interface can be implemented in software. For example, the loopback interface

(127.0.0.1 for IPv4 and ::1 for IPv6) is not a physical device but a piece of software simulating a
network interface. The loopback interface is commonly used in test environments.
The java.net.NetworkInterface class represents both types of interfaces.
NetworkInterface is useful for a multi-homed system, which is a system with multiple NICs.
Using NetworkInterface, you can specify which NIC to use for a particular network activity.

Retrieving Network Interfaces


The NetworkInterface class has no public constructor. Therefore, you cannot just create a new
instance of this class with the new operator. Instead, the following static methods are available so that you
can retrieve the interface details from the system: getByInetAddress(), getByName(),
and getNetworkInterfaces(). The first two methods are used when you already know the IP
address or the name of the particular interface. The third method, getNetworkInterfaces() returns
the complete list of interfaces on the machine.
Network interfaces can be hierarchically organized. The NetworkInterface class includes two
methods, getParent() and getSubInterfaces(), that are pertinent to a network interface hierarchy.
The getParent() method returns the parent NetworkInterface of an interface. If a network interface
is a subinterface, getParent() returns a non-null value. ThegetSubInterfaces() method returns all
the subinterfaces of a network interface.

Listing Network Interface Addresses


One of the most useful pieces of information you can get from a network interface is the list of IP
addresses that are assigned to it. You can obtain this information from a NetworkInterfaceinstance by
using one of two methods. The first method, getInetAddresses(), returns
an Enumeration of InetAddress. The other method, getInterfaceAddresses(), returns a list
of java.net.InterfaceAddress instances. This method is used when you need more information
about an interface address beyond its IP address. For example, you might need additional information
about the subnet mask and broadcast address when the address is an IPv4 address, and a network prefix
length in the case of an IPv6 address.

Network Interface Parameters


You can access network parameters about a network interface beyond the name and IP addresses
assigned to it
You can discover if a network interface is up (that is, running) with the isUP() method. The following
methods indicate the network interface type:

isLoopback() indicates if the network interface is a loopback interface.

isPointToPoint() indicates if the interface is a point-to-point interface.

isVirtual() indicates if the interface is a virtual interface.

The supportsMulticast() method indicates whether the network interface supports multicasting.
The getHardwareAddress() method returns the network interface's physical hardware address,
usually called MAC address, when it is available. The getMTU() method returns the Maximum
Transmission Unit (MTU), which is the largest packet size.

Lesson: Working With Cookies


Though you are probably already familiar with cookies, you might not know how to take advantage of
them in your Java application. This lesson guides you through the concept of cookies and explains how to
set a cookie handler so that your HTTP URL connections will use it.

You might also like