Manipulating Texts
Learning Outcomes
● Python built-in functions
● Work with numeric data and string data
● Objects and methods
● Formatting numbers and strings
Built-in functions
● Python provides many useful functions for common
programming tasks.
● A function is a group of statements that performs a specific
task.
● You have already used the functions eval , input , print , and
int ,...
○ These are built-in functions and they are always available
in the Python interpreter.
○ You don’t have to import any modules to use these
functions.
3
Math module
import math
Mathematical
functions and
constants
E.g. [Link]
4
Math module
# import math module to use the math functions
import math
# Test algebraic functions
print("log10(10, 10) =", [Link](10, 10) )
print("sqrt(4.0) =", [Link](4.0) )
# Test trigonometric functions
print("sin(PI / 2) =", [Link]([Link] / 2) )
print("cos(PI / 2) =", [Link]([Link] / 2) )
print("tan(PI / 2) =", [Link]([Link] / 2) )
print("degrees(1.57) =", [Link](1.57) ) 5
print("radians(90) =", [Link](90) ) 5
Manipulating text[1]
∙ Why is text so important in computing applications?
− In data analysis, we work with a lot of text
● e.g. we have to mine large texts such as news feeds or
social media posts to extract information of interest by
searching for specific patterns….
− Another reason: As developers and scientists, the programs
that we write often need to work as part of a pipeline,
alongside other programs that have been written by other
people.
∙ To do this, we'll often need to write code that can understand the
output from some other program (we call this parsing) or produce
output in a format that another program can operate on. Both of these
tasks require manipulating text. 6
6
String
∙ To represent text , we use the “String” type in Python
− A string is a sequence of characters
− Python treats characters and strings the same way.
− Enclosed within double quotes(“) or single quotes(‘).
− Example:
>>> message ="Hello World"
>>> print(message)
Hello World
∙ We can also use special characters to define text:
>>> message="Hello\nWorld"
>>> print(message)
Hello
7
World
7
Special characters
● \n
● \t
● \\
● \’
● \”
>>> print("He said, \"John's program is easy to read\"")
He said, "John's program is easy to read"
8
Objects of type String
∙ Objects of type String (str) are used to represent strings of characters.
− E.g. 'abc' or "abc"
− E.g. '123' denotes a string of three characters, not the number one
hundred twenty-three.
∙ Exercise: Try typing the following expressions in to the Python
interpreter
∙ >>> 'a'
>>> 3*4
>>> 3*'a'
>>> 3+4
>>> 'a'+'a'
9
Objects and Methods
● In Python, all data—including numbers and strings—are
actually objects.
● In Python, a number is an object, a string is an object, …
● Objects of the same kind have the same type.
● You can use the type() function to get the class/type of an
object.
● You can perform operations on an object. The operations are
defined using functions.
● The functions for the objects are called methods in Python.
Methods can only be invoked from a specific object.
10
Input String [1]
∙ A string value can be input using the input() method
>>> firstName = input(“Please enter your name: ”)
∙ All values input through the input functions are strings.
∙ Strings containing digits are converted to numbers using the
eval() function.
11
Storing strings in variables
∙ We can take a string and assign a name to it using an
equals sign – we call this a variable:
>>> my_name = "Something”
>>> print(my_name)
Something
∙ Note: When we use the variable in a print statement, we
don't need any quotation marks – the quotes are part of
the string, so they are already "built in" to the variable
my_name.
12
Storing strings in variables (cont.)
∙ We can change the value of a variable as many times as
we like once we've created it:
>>> my_name = "Something"
>>> print(my_name)
Something
# change the value of my_name
>>> my_name = "Another Thing"
Another Thing
13
String Indexing
∙ Indexing can be used to extract individual characters from a
string.
∙ In Python, all indexing is zero-based.
− Typing 'abc'[0] into the interpreter will cause it to display
the string 'a' .
− Typing 'abc'[3] will produce the error message
IndexError: string index out of range .
− Since Python uses 0 to indicate the first element of a
string, the last element of a string of length 3 is accessed
using the index 2.
14
String Indexing (cont.)
∙ Indexing is used in string expressions to access a specific
character position in the string
∙ The general form for indexing is:
<string>[<expr>]
∙ Note: <expr> can be an integer value, an integer variable or
expression that gives an integer as result; its value
determines which character is selected from the string
15
String Indexing
∙ Visually, “Hello Bob” can be represented as in the
diagram below:
0 1 2 3 4 5 6 7 8
H e l l o B o b
∙ Notice that the string index is numbered and starts
with 0 and ends with n – 1 (n being the string
length)
∙ Negative numbers are used to index from the end of
a string.
○ E.g. the value of 'abc'[-1] is 'c'.
16
Indexing – Example
>>> greet = “Hello Bob”
>>> greet[0]
‘H’
>>> print (greet[0], greet[2], greet[4])
Hlo
>>> x = 8
>>> print (greet[x–2])
B
17
Exercise 3.1
∙ Write a program to input a text and to display
the characters at locations 3, 4 and 10. You
can assume that the input text is long enough.
18
Tools for manipulating strings
∙ So far we have shown that we can store and print strings
∙ But Python also provides the facilities for
manipulating strings.
∙ Python has many built-in functions for carrying out
common operations, and in the following slides we'll
take a look at them one-by-one.
19
Concatenation
∙ We can concatenate (stick together) two strings using the +
symbol.
∙ This symbol will join together the string on the left with the
string on the right:
>>> my_name = "John" + "Smith"
>>> print(my_name)
JohnSmith
∙ We can also concatenate variables that point to strings:
>>> firstname = "John”
>>> my_name = firstname + "Smith"
# my_name is now "JohnSmith"
20
Concatenation (cont.)
∙ We can even join multiple strings together in one go:
>>> upstream = "AAA"
>>> downstream = "GGG"
>>> my_dna = upstream + "ATGC" + downstream
# my_dna is now "AAAATGCGGG"
∙ Note: the result of concatenating two strings together is itself
a string. So it's perfectly OK to use a concatenation inside a
print statement:
>>> print("Hello" + " " + "world")
21
Repetition
∙ * for repetition: builds a string by multiple concatenations
of a string with itself
∙ Example:
>>> 3 * “John”
‘JohnJohnJohn’
Note: The code 'a'*'a' produces the error message
TypeError: can't multiply sequence by non-int of type 'str'
22
Exercise 3.2
∙ Predict the output of the following Python
program:
a = “Rosalind”
b=“Franklin”
c=“!”
print(a,b,3*c)
23
Finding the length of a string
∙ The len built-in function takes a single argument (a string)
∙ len outputs a value (a number) that can be stored – we call
this the return value.
○ If we write a program that uses len to calculate the
length of a string, the program will run but we won't see
any output:
# this line doesn't produce any output
>>> len("SampleText")
● If we want to actually use the return value, we need to store
it in a variable, and then do something useful with it (like
printing it):
>>> text_length = len("SampleText")
>>> print(text_length)
24
Finding the length of a string (cont.)
∙ Consider this short program ([Link]) which
calculates the length of a text and then prints a message
telling us the length:
# store the text in a variable
my_text = "SampleText"
# calculate the length of the text and store it in a variable
text_length = len(my_text)
# print a message telling us the text length
print("The length of the text is " + text_length)
25
Finding the length of a string (cont.)
When we try to run the program we get the following error:
1 Traceback (most recent call last):
2 File "[Link]", line 6, in <module>
3 print("The length of the text is " + text_length)
4 TypeError: must be str, not int
● The error message (line 4) is short but informative: "cannot
concatenate 'str' and 'int' objects".
● Python is complaining that it doesn't know how to
concatenate a string (which it calls str for short) and a number
(which it calls int – short for integer).
● But Python has a built-in solution – a function called str
which turns a number into a string so that we can print it.
26
Finding the length of a string (cont.)
∙ Here's how we can modify our program to use it
# store the text in a variable
my_text = "SampleText"
# calculate the length of the text and store it in a variable
text_length = len(my_text)
# print a message telling us the text lenth
print("The length of the text is " + str(text_length))
27
Changing case
∙ We can convert a string to lower case by using a new type of syntax – a
method that belongs to strings.
∙ A method is like a function, but instead of being built in to the Python
language, it belongs to a particular type.
∙ The method we are talking about here is called lower, and we say that it
belongs to the string type. Here's how we use it:
my_text = "SampleText"
# print my_text a in lower case
print(my_text.lower())
∙ Notice how using a method looks different to using a function.
− When we use a function like print or len, we write the function name first and the
arguments go in parentheses:
print("SampleText")
len(my_text)
∙ We can also change the case to upper, using the upper() method
>>> print("SampleText".upper())
SAMPLETEXT 28
To test if a string is in upper/lower case
>>> uni='university of mauritius’
>>> uni
'university of mauritius'
>>> [Link]()
True
>>> [Link]()
False
29
Replacement
∙ replace is another example of a useful method that
belongs to the string type
∙ it takes two arguments (both strings) and returns a copy
of the variable where all occurrences of the first string
are replaced by the second string.
30
Replacement (cont.)
∙ Example of replace :
str1 = "Java is a programming language"
# Calling function
str2 = [Link]("Java","Python")
# Displaying result
print("Old String: \t",str1)
print("New String: \t",str2)
Output
Old String: Java is a programming language
New String: Python is a programming language
31
Slicing a string
∙ Slicing is used to extract substrings of arbitrary length.
∙ If s is a string, the expression s[start:end] denotes the
substring of s that starts at index start and ends at index
end-1 .
− For example, 'abc'[1:3] = 'bc' .
∙ If the value before the colon is omitted, it defaults to 0.
∙ If the value after the colon is omitted, it defaults to the
length of the string.
∙ Consequently, the expression 'abc'[:] is semantically
equivalent to the more verbose 'abc'[0:len('abc')]
32
Extracting part of a string - Slicing
∙ Note that in Python, the positions in a string start from zero(0)
up to the position (length_of_string-1)
∙ Slicing is an operation that allows us to access a contiguous
sequence of characters or substring from a string
∙ It can be thought of as a way of indexing a range of positions
in the string
∙ Syntax:
<string>[<start>:<end>]
− Note: Both start and end should be int-valued expressions
33
Extracting part of a string (cont.)
∙ Example of substring:
module = "Problem Solving Techniques"
# print positions three to five
print(module[3:5])
# positions start at zero, not one
print(module[0:6])
# if we use a stop position beyond the end, it's the same as using the end
print(module[0:60])
Output:
bl
Proble
Problem Solving Techniques 34
Extracting part of a string (cont.)
∙ If we just give a single number in the square
brackets, we'll just get a single character:
food = "pizza"
first_char = pizza[0]
print(first_char)
Output:
p
35
Extracting part of a string (cont.)
1 s = “Hello”
2 print(s[0]) ‘H’
3 print(s[4]) ‘o’
4 print(s[-1]) ‘o’ “Slices” can be taken with
5 print(s[1:3]) ‘el’ indices separated by a colon
6 print(s[2:]) ‘llo’ Third term in slice determines
7 print(s[:3]) ‘Hel’ step size.
8 print(s[::2]) ‘Hlo’
9 print(s[::-1]) ‘olleH’
10 print(len(s)) 5
36
Exercise 3.3
∙ Write a program that allows the input of a
movie title, followed by 2 integer values x
and y and displays the substring between
positions x and y inclusive in the movie title.
37
The “in” and “not in” operators
∙ in : membership operator : true if first string exists inside
second string
∙ not in :non-membership: true if first string does not exist
in second string
∙ Examples
>>> 'John' in 'Sir John Smith’
true
>>> 'x' in 'sample’
false
38
Counting and finding substrings
∙ A very common job in text analysis is to count the number
of times some pattern occurs in a text.
∙ In computer programming terms, what that problem
translates to is counting the number of times a substring
occurs in a string.
∙ The method that does the job is called count.
− It takes a single argument whose type is string, and
returns the number of times that the argument is found
in the variable.
− The return type is a number
39
Counting and finding substrings (cont.)
string = "Python is awesome, isn't it?"
substring = "is"
count = [Link](substring)
print("The count is:", count)
Output:
The count is: 2
40
Counting and finding substrings (cont.)
Count number of occurrences of a given substring
using start and end
string = "Python is awesome, isn't it?"
substring = "is"
count = [Link](substring,8,25)
print("The count is:", count)
Output:
The count is: 1
41
Exercise 3.4
∙ Write a program that allows the input of a
sentence and displays the count of ‘a’ and
‘s’ in the sequence.
42
Exercise 3.5
∙ Write a program that allows a user to input a
sentence and displays five (5) integers
(separated by spaces) counting the
respective number of times that each vowel
occurs in the sequences.
43
Exercise 3.6
∙ Write a program that allows a user to input a DNA sequence
(that can be made up of the alphabets ‘A’, ‘C’, ‘G’ and ‘T’ in
upper or lowercase).
∙ The program will then calculate and display the GC content
(total percentage of G and C) of that sequence.
∙ To calculate the GC content of a DNA sequence (which is
simply a string):
− we must find the sum of “G” and “C”
− divide that sum by the length of the string
− Then, multiply by 100
[Hint: you can use normal mathematical symbols like add (+), subtract (-),
multiply (*), divide (/) and parentheses to carry out calculations on numbers in
Python.]
44
Counting and finding substrings (cont.)
∙ A closely-related problem to counting substrings is
finding their location.
∙ What if instead of counting the number of ‘a’ in
our text we want to know where they are?
∙ The find method will give us the answer, at least
for simple cases.
− find takes a single string argument, just like count, and
returns a number which is the position at which that
substring first appears in the string (in computing, we
call that the index of the substring).
45
Counting and finding substrings (cont.)
∙ Remember that in Python we start counting from
zero rather than one, so position 0 is the first
character, position 4 is the fifth character, etc.
∙ Examples:
word = "problem"
print([Link]('p'))
print([Link]('ob'))
print([Link]('w'))
Output
0
2
46
-1
Counting and finding substrings (cont.)
>>> dna="aagtccgcgcgctttttaaggagccttttgacggc”
#search from position 0
>>> [Link]('ag')
1
# search from position 17, after the first occurrence
>>> [Link](‘ag’,17)
18
>>> [Link](‘ag’,19)
21
# same as find but search backwards
>>> [Link](‘ag’)
21 47
Output Formatting
>>> print("The DNA sequence’s GC content is", gc_perc,"%")
The DNA sequence’s GC content is 53.06122448979592 %
∙ The value of the gc_perc variable has many digits following the
dot which are not very significant. You can eliminate the display
of too many digits by imposing a certain format to the printed
string
Formatting string value that is formatted
>>> print("The DNA sequence’s GC content is %5.3f %%" % gc_perc)
note the double % to print a % symbol
percent operator separating the formatting string
and the value to replace the format placeholder
48
Display Values Formatting
∙ A formatting specifier has this general form:
%<width>.<precision><type-char>
● The specifier starts with a % and ends with a character that
indicates the data type of the value being inserted
49
Formatting numbers
50
Formatting - Placeholders
∙ >>> print(“Hello %s %s, you may have won $%d!” % (“Mr.”, “Smith”, 10000))
Hello Mr. Smith, you may have won $10000!
● >>> print(‘This int, %5d, was placed in a field of width 5’ % (7))
This int, 7, was placed in a field of width 5
● >>> print(‘This int, %10d, was placed in a field of width 10’ % (7))
This int, 7, was placed in a field of width 10
51
Yet another Example
∙ >>> print(‘This float, %10.5f, has width 10 and precision 5.’ % (3.1415926))
This float, 3.14159, has width 10 and precision 5.
● >>>print(‘This float, %0.5f, has width 0 and precision 5.’ % (3.1415926))
‘This float, 3.14159, has width 0 and precision 5.’
● >>>import math
● >>>print("Compare %f and %0.20f" % ([Link], [Link]))
Compare 3.141593 and 3.14159265358979311600
52
Formatting strings (s)
● The format specifier 20s specifies that the string is formatted
within a width of 20. By default, a string is left justified.
● To right-justify it, put the symbol > in the format specifier.
● If the string is longer than the specified width, the width is
automatically increased to fit the string.
53
Print
● print() automatically prints a linefeed ( \n ) to cause the output to
advance to the next line.
● If you don’t want this to happen after the print function is
finished, you can invoke the print function by passing a special
argument end
print("AAA", end=' ')
print("BBB", end='')
print("CCC", end='***')
print("DDD", end='***')
Output
AAA BBBCCC***DDD***
54
Exercise 3.7
∙ Calculating AT content
Here's a short DNA sequence:
ACTGATCGATTACGTATAGTATTTGCTATCATACATA
TATATCGATGCGTTCAT
Write a program that will print out the AT content of this
DNA sequence.
[Hint: you can use normal mathematical symbols like add
(+), subtract (-), multiply (*), divide (/) and parentheses to
carry out calculations on numbers in Python.]
55
The Split() function
∙ This function is used to split a string into a sequence of
substrings
∙ By default, it will split the string wherever a space occurs
>>> S="Hello String Library"
>>> [Link]()
['Hello', 'String', 'Library']
∙ However, it can also split on a chosen character
>>> S="32,24,25,57"
>>> [Link](',')
['32', '24', '25', '57']
56
Exercise 3.8
An important process in Computational Biology consists of breaking
a sequence on a particular pattern.
Write a program that allows the input of a DNA sequence and splits
it on the pattern “ATG” into a number of subsequences. The program
should then display the list of subsequences.
Note: Ensure that your sequence contains a number of
occurrences of “ATG”
57
More String Operations
Function Description
[Link]() Copy of s with only the first character capitalised
[Link]() Copy of s with first character of each word capitalised
[Link](width) Center s in a field of given width
[Link](sub) Count the number of occurrences of sub in s
[Link](sub) Find the first position where sub occurs in s
[Link](width) Like center, but s is left-justified
[Link]() Copy of s in all lowercase characters
[Link]() Copy of s with leading whitespace removed
58
String Operations
Function Description
[Link](olssub, newsub) Replace all occurrences of oldsub in s with newsub
[Link](sub) Like find, but returns the rightmost position
[Link](width) Like center, but s is right-justified
[Link]() Copy of s with trailing whitespaces removed
[Link]() Split s into a list of substrings
[Link]() Copy of s with all characters converted to upper case
59
String Operations
Function Description
[Link](separator) method splits a string into a list, using provided
separator. By default any whitespace is a separator
[Link](separator, maxsplit) method splits a string into a list, starting from the right.
maxsplit specifies how many splits to do. Default value
is -1, which is "all occurrences"
[Link](characters) removes any leading (spaces at the beginning) and
trailing (spaces at the end) characters (space is the
default leading character to remove)
60
Exercise 3.9
Write a program to input a string and output a new string where all
occurrences of the first char of the original string has been changed
to '$', except the first char itself.
Sample output
Input String : 'restart'
New String : 'resta$t'
61
Exercise 3.10
Write a Python program to input two strings and create a single
string using the two given strings, separated by a space and
swapping the first two characters of each string.
Sample output
Input first String: abc
Input second String: xyz
New String : xyc abz
62
Exercise 3.11
Write a Python program to input a string and return a new string
made of 4 copies of the last two characters of the original string
(length must be at least 2).
Sample output
Input first String: Python
New String : onononon
63
Exercise 3.12
Write a Python program to input a string and output the last part of a
string before a specified character..
Sample output
Input a String: [Link]
Input char: /
Output string: [Link]
64
Exercise 3.13
Write a Python program to input a floating point number and display
the number with no decimal places. [Hint: Use [Link]]
Sample outputs
Input a floating point number: 3.1415926
Formatted Number with no decimal places: 3
Input a floating point number: -12.9999
Formatted Number with no decimal places: -13
65
Exercise 3.14
Write a program to print the following integers with zeros on the left
of specified width:
Original Number: 3
Formatted Number(left padding, width 2): 03
Original Number: 123
Formatted Number(left padding, width 6): 000123
[Hint: Use [Link]]
66
Acknowledgments
● DGT1039Y lectures notes by Dr. Shakun Baichoo, FoICDT