Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

String

http://www.bogotobogo.com/cplusplus/string.php

Strings is a source of confusion because it is not clear what is meant by string. Is it an


ordinary charater array of type char* (with or without the const), or an instance of
class string? In general, we use C-string for the type of char* or const char * and we
use string for objects of C++ standard library.

C++ has two types of string:

1. C-style character string


2. C++ <string> class which is Standard C++ string.

In C, when we use the term string, we're referring to a variable length array of


characters. It has a starting point and ends with a string-termination character. While
C++ inherits this data structure from C, it also includes strings as a higher-level
abstraction in the standard library.

The primary difference between a string and an array of characters revolves


around length. Both representations share the same fact: they represent contiguous
areas of memory. However, the length of an array is set at the creation time of the array
whereas the length of a string may change during the execution of a program. This
difference creates several implications, which we'll explore shortly.

Here is the quick summary:

1. "Hello" is a string literal, and we may want to use the following to assign it to a
pointer:

2. const char *pHello = "hello";

We should always declare a pointer to a string literal as const char *. 


strlen(pHello)=5, the terminating null character is not included.

3. To store C++ string to C-style string:

4. string str = "hello";


5. const char *cp = str.c_str();
6. char *p = const_cast<char *> (str.c_str());

The following code is converting int to string, and it can be an example of returning a
pointer to character array from C++ string.

1. #include <sstream>
2. char *int2strB(int n, char *s)
3. {
4. stringstream ss;
5. ss << n;
6. string cpp_string = ss.str();
7. const char *pCstring = cpp_string.c_str();
8. s = const_cast<char *>(pCstring);
9. return s;
10. }
11.

In the code, we made a C++ string from stream (ss.str()), then converted it to C string
(const char *) using c_str(). Finally, we casted the constantness (const_cast<char *>)
to make it char * type. 
12. An array name is a constant pointer to the first element of the array, and that's why
we can't even copy arrays using assignment.(arrayA = arrayB is not allowed)

13. int a[100];


14. int b[100];
15. a = b; //error
16.
17. char cArray[10];
18. cArray = "Hello!"; //error
19.
20. char *cPtr;
21. cPtr = "Hello!"; //OK
22.
23. // conversion - array name and pointer
24. int *iPtr;
25. iPtr = a; // OK
26. iPtr = &a;[0]; // OK
27.
28. int *iPtr2 = new int[100];
29. a = iPtr2; // error

When we do arrayA = arrayB, the intention is to make arrayA refer to the same area of


memory as arrayB. This will be a compile error because we can't change the memory
location to which arrayA points to. If we really want to copy arrayB into arrayA, we
need to write a loop that does element-by-element assignment, or use a memory library
function such as memcpy.

Arrays can be implicity converted to pointers without casting. However, there is no


implicit conversion from pointers to arrays.

C string
Though the Standard C library is included as a part of Standard C++, to use C library,
we need to include the C header file:

#include <cstring>

C string is stored as a character array. Actually, a string is a series of characters stored


in consecutive bytes of memory. The idea of a series of characters stored in
consecutive bytes implies that we can store a string in an array of char, with each
character kept in its own array element.

C-style strings have a special feature: The last character of every string is the null
character.

char non_string [10] = {'n','o','n','_','s','t','r','i','n','g'};


char a_string [9] = {'a','_','s','t','r','i','n','g','\0'};

Both of these are arrays of char, but only the second is a string (an array of 9
characters). So, the 8-character string requires a 9-character array. This scheme makes
finding the length of the string an O(n) operation instead of O(1) operation.

So, the strlen() must scan through the string until it finds the end. For the same reason
that we can't assign one C array to another, we cannot copy C strings using '=' operator.
Instead, we generally use the strcpy() function. However, note that it has also long
been recommended to avoid standard library functions like strcpy() which are not
bounds checked, and can cause buffer overflow.

We manipulate the C string using a pointer. That's why C string is sometimes


called pointer-based string. For example,

const char *str ="I am a string";

we traverse it one by one each character:

while(*str++){ }
The char* pointer str is dereferenced, and the character addressed is checked if it's true
or false (if it's null).

Actually, a string literal may be used as an initializer in the declaration of either a


character array or a variable of type char *. The declarations:

char str[] = "hello";


const char *pStr = "hello";

In both cases, each initialize a variable to the string "hello". The first declaration creates
a six-element array str containing the characters 'h', 'e', 'l', 'l', 'o' and '\0'. The second
declaration creates pointer variable pStr that points to the letter h in the string "hello",
which also ends in '\0', in memory.

We cannot modify string literals; to do so is either a syntax error or a runtime error.


String literals are loaded into read-only program memory at program startup. We can,
however, modify the contents of an array of characters.

Attempting to modify one of the elements str is permitted, but attempting to modify one
of the characters in pStr will generate a runtime error, causing the program to crash.

Just to get the feeling of C string, let's briefly look at the strlen():

size_t strlen ( const char * str );

The length of a C string is determined by the terminating null-character. The length of a


C string is the number of characters and the terminating null character is not included.
Sometimes, we get confused with the size of the array that holds the string:

char str[50]="0123456789";

defines an array of characters with a size of 50 chars, but the C string, str, has a length
of only 10 characters. So, sizeof(str) = 50, but strlen(str) = 10.
String literals have static storage class, which means they exist for the duration of the
program. They may or may not be shared if the same string literal is referenced from
multiple locations in a program. According to the C++ standard, the effect of attempting
to modify a string literal is undefined. Therefore, we should always declare a pointer to a
string literal as const char *.

Since there are still many program situations which require understanding C-style string,
we need to be familiar with the C-style string.

Let's find out how much we know about C-string using the examples below. Can you
figure out what's wrong?

Question 1

char *cstr1 = "hello";


*(cstr1)='t';

Question 2

char *cstr2;
strcpy(cstr2, "hello");
*(cstr2)='t';

Question 3

char cstr3[100];
cstr3 = "hello";
*(cstr3)='t';

Question 4

char cstr4[100] = "hello";


*(cstr4)='t';
Answer 1 
Compiles successfully.
In run time, however, at the moment when it tries to write, it fails.
We get this message:
Unhandled exception at 0x00411baa in cstring.exe: 0xC0000005:
Access violation writing location 0x0041783c.

When our program is compiled, the compiler forms the object code file, which contains
our machine code and a table of all the string constants declared in the program. In the
statement,

char *cstr1 = "hello";

causes cstr1 to point to the address of the string hello in the string constant table.
Since this string is in the string constant table, and therefore technically a part of the
executable code, we cannot modify it. We can only point to it and use it in a read-only
manner.
The "hello" is a string literal (or string constant) because it is written as a value, not a
variable. Even though string literals don't have associated variables, they are treated
as const char*'s (arrays of constant characters). String literals can be assigned to
variables, but doing so can be risky. The actual memory associated with a string literal
is in a read-only part of memory, which is why it is an array of constant characters. This
allows the compiler to optimize memory usage by reusing references to equivalent
string literals (that is, even if your program uses the string literal "hello" 100 times, the
compiler can create just one instance of hello in memory). The compiler does not,
however, force our program to assign a string literal only to a variable of type const
char* or const char[]. We can assign a string to a char* without const, and the program
will work fine unless you attempt to change the string, which is what we're trying to do in
the last line of the Question #1.

Answer 2 
This code is compiled successfully. But we need to allocate memory for the character
pointer. 
The right code should look like this:

char *cstr2 = (char*)malloc(strlen("hello")+1);


strcpy(cstr2,"hello");
*(cstr2)='t';
Answer 3
This won't compile.

At the line:

cstr3 = "hello";

we get the following message:


cannot convert from 'const char [6]' to 'char [100]'
Since the string hello exists in the constant table, we can copy it into the array of
characters named cstr3. However, it is not a pointer, the statement

cstr3="hello";

will not work.


We can think of the problem this way: the pointer we get from the name of an array as a
pointer to its first element is a value NOT a variable, so we cannot assign to it.

In fact, an array name is a constant pointer to the first element of the array. As a


consequence of this implicit array-name-to-pointer conversion, we can't even copy
arrays using assignment:

int a[100];
int b[100];
...
a = b; //error

Answer 4
No problem at all.

Summary

In general, the Asnwers from 1 to 4 may vary depending on compiler. So, the best way
of avoiding unexpected run-time error, is to use a pointer to const characters when
referring to string literals:
const char* ptr = "hello"; // Assign the string literal to a variable.

*ptr = 't'; // Error - Attempts to write to read-only memory


ptr[0] = 't'; // expression must be a modifiable lvalue

C Library Utilities
Standard C library gives us a set of utility functions such as:

/*returns the length of the string*/


int strlen(const char*);

/*copies the 2nd string into the 1st*/


char* strcpy(char*, const char*);

/*compares two strings*/


int strcmp(const char*, const char*)

Some of the source code for the library utility functions can be found Small Programs.

Because we manipulate the C string using pointer which is low-level operation, it's error
prone. That's why we have C++ <string> class.

C++ string

Though there are lots of advantages of C++ string over C string, we won't talk about it at
this time. But because numerous codes still using C style string, we need to look at the
conversion between them. Here is an example with some error messages we'll get if we
run it as it is.

#include <string>
#include <cstring>

int main()
{
using namespace std;
string str1;
const char *pc = "I am just a character array";

// C++ string type automatically converting a C character string


// into a string object.
// string class defines a char* - to-string conversion, which makes
// it possible to initialize a string object to a C-style string.

str1 = pc; //ok

// error C2440: 'initializing':


// cannot convert from 'std::string' to 'char *'

char *p1 = str1; //not ok

// error C2440: 'initializing' :


// cannot convert from 'const char *' to 'char *'

char *p2 = str1.c_str(); //not there yet

const char *p3 = str1.c_str(); //ok


// removing (casting) constantness

char *p4 = const_cast<char *> (str1.c_str()); // ok

return 0;
}

The c_str() returns the contents of the string as a C-string. Thus, the '\0' character is


appended. Actually, it returns a pointer to const array in order to prevent the array from
being directly manipulated. That's why we need const qualifier in the example code.
Note that C++ strings do not provide a special meaning for the character '\0', which is
used as special character in an ordinary C-string to mark the end of the string. The
character '\0' is part of a string just like every other character.

String Initialization
Here are the couple of ways to initialize a string:

string s1; // Default constructor; s1 is an empty string


string s2(s1); // Initialize s2 as a copy of s1;
string s3("literal"); // Initialize s3 as a copy of a string literal
string s4(n, 'c'); // Initialize s4 with n copies of a character 'c'

The string class provides several constructors. If we define a string type without
explicitly initializing it, then default constructor is used.

C++ provides two forms of variable initialization:

int i(256); // direct initialization


int i = 256; // copy initialization

Note that initialization and assignment are different operations in C++. When we do
initialization, a variable is created and given its initial value. However, when we do
assignment, an object's current value is obliterated and replaced with a new one. For a
built-in type variable like int, there is little difference between the direct and the copy
forms of initialization. However, when we deal with more complex types, the difference
becomes clear. The direct initialization tends to be more efficient.

tring Operations

1. s1=s2

Assign s2 to s1; s2 can be a string or a C-style string.

2. s+=a

Add a at end; a can be a character or a C-style string.

3. s[i]

subscripting

4. s1+s2

Concatenation; the characters in the resulting string will be a copy of those


from s1followed by a copy of those from s2.

5. s1<s2

Lexicographical comparison of string values; s1 or s2, but not both, can be a C-style


string.

6. s1==s2

Comparison of string values; s1 or s2, but not both, can be a C-style string.

7. s.size()

Number of characters in s.

8. s.length()

Number of characters in s.


9. s.c_str()

C-style version of characters in s.

10. s.begin()

Iterator to first character.

11. s.end()

Iterator to one beyond the end of s.

12. s.insert(pos,a)

Insert a before s[pos]; a can be a character, a string, or a C-style string. s expands to


make room for the characters from a.

13. s.append(pos,a)

Insert a after s[pos]; a can be a character, a string, or a C-style string. s expands to


make room for the characters from a.

14. s.erase(pos)

Remove the character in s[pos]; s's size decreases by 1.

15. pos=s.find(a)

Find a in s; a can be a character, a string, or a C-style string. pos is the index of the


first character found, or npos (a position off the end of s. Wat if we can't find the string
as in the example below?

if(s.find("mystring") == string::npos) cout << "can't find it \n";

Because "mystring" does not exist in "s", find() returns a constant which we access


with string::npos. As a result, it displays the message. The string::npos represents
the largest possible size of a string object, and it means a position that can't exist. So,
it is the perfect return value to indicate the failure of finds
16. in >> s

Read a whitespace-separated word into s from in.

17. getline(in,s)

Read a line into s from in.

18. out << s

Write from s to out.

Pointers and Strings

Take a look at the following code:

char name[12] = "Alan Turing";


std::cout << name << " is one of the greatest.\n";

The name of an array is the address of its first element. The name in the cout is the
address of the char element containing the character A. The cout object assumes that
the address of a char is the address of a string. So, it prints the character at that
address and then continues printing characters until it meets the null character, '\0'.

In other words, a C string is nothing more than a char array. Just as C doesn't track the
size of arrays, it doesn't track the size of strings. Instead, the end of the string is marked
with a null character, represented in the language as '\0'. So, if we give the cout the
address of a character, it prints everything from that character to the first null character
that follows it.

The key here is that name acts as the address of a char which implies that we can use
a pointer-to-char variable as an argument to cout.

What about the other part of the cout statement?


If name is actually the address of the first character of a string, what is the expression "
is one of the greatest.\n"? To be consistent with cout's handling of string output, this
quoted string should also be an address. Yes, it is. A quoted string serves as the
address of its first element.
It doesn't really send a whole string to cout. It just sends the string address. This
means:

1. strings in an array
2. quoted string constants
3. strings described by pointers

are all handled equivalently. Each of them is really passed along as an address.
The following example shows how we use different forms of strings. It uses two
functions from the string library, strlen() and strcpy(). Prototypes of the functions are
in cstring header file.

#include <iostream>
#include <cstring>

int main()
{
using namespace std;

char nameArr[12] = "Alan Turing";


const char *namePtrConstChar = "Edsger W. Dijkstra";
char *ptr;

cout << nameArr << " and " << namePtrConstChar << endl;
cout << endl;

ptr = nameArr;

cout << "1: " << nameArr << " @ " << (int *)nameArr << endl;
cout << "1: " << ptr << " @ " << (int *) ptr << endl;
cout << endl;

ptr = new char[strlen(nameArr) + 1];


strcpy(ptr, nameArr);
cout << "2: " << nameArr << " @ " << (int *)nameArr << endl;
cout << "2: " << ptr << " @ " << (int *) ptr << endl;
delete [] ptr;
}

Output from the run:

Alan Turing and Edsger W. Dijkstra

1: Alan Turing @ 0017FF1C


1: Alan Turing @ 0017FF1C

2: Alan Turing @ 0017FF1C


2: Alan Turing @ 007A1F20

The code above creates one char array, nameArr and two pointers-to-


char variables, nameBand ptr. The code begins by initializing the nameArr to
the "Alan Turing" string. Then, it initializes a pointer-to-char to a string:

const char *namePtrConstChar = "Edsger W. Dijkstra";

"Edsger W. Dijkstra" actually represents the address of the string, so this assigns the
address of

"Edsger W. Dijkstra" to the namePtrConstChar pointer.

String literals are constants, which is why the code uses the const keyword.


Using constmeans we can use namePtrConstChar to access the string but not to
change it.

The pointer ptr remains uninitialized, so it doesn't point to any string.

The code illustrates that we can use the array name nameArr and the
pointer namePtrConstChar equivalently with cout. Both are the addresses of strings,
and coutdisplays the two strings stored at those addresses.
Let's look at the following code of the example:

cout << "1: " << nameArr << " @ " << (int *)nameArr << endl;
cout << "1: " << ptr << " @ " << (int *) ptr << endl;

It produces the following output:

1: Alan Turing @ 0017FF1C


1: Alan Turing @ 0017FF1C

In general, if we give cout a pointer, it prints an address. But if the pointer is type char


*, coutdisplays the pointed-to-string. If we want to see the address of the string, we
should cast the pointer to another pointer type, such as int *. Thus, ptr displays as the
string "Alan Turing", but (int *)ptr displays as the address where the string is located.
Note that assigning nameArr to ptr does not copy the string, it copies the address. This
results in the two pointers (nameArr and ptr) to the same memory location and string.

To get a copy of a string, we need to allocate memory to hold the string. We can do this:

1. declaring a second array


2. using new

In the code, we use the second approach:

ptr = new char[strlen(nameArr) + 1];

Then, we copy a string from the nameArr to the newly allocated space. It doesn't work if
we assign nameArr to ptr because it just changes the address stored in ptr and thus
loses the information of the address of memory we just allocated. Instead, we need to
use the strcpy():
strcpy(ptr, nameArr);

The strcpy() function takes two arguments. The first is the destination address, and the
second is the address of the string to be copied. Note that by using strcpy() and new,
we get two separate copies of "Alan Turing":

2: Alan Turing @ 0017FF1C


2: Alan Turing @ 007A1F20

Additional codes related to string manipulation samples which frequently appear at


interviews are sources A and sources B

You might also like