SRI-Utility-PDF v1.0 Technical Documentation

SRI – Utility - PDF
BLUE PRISM VBO TECHNICAL DOCUMENTATION

Version: 1.0
automation@sriinfotech.com | PL: +48 58 742 53 56 | US: (888) 513-0114

www.sriinfotech.com
Contents
1. Introduction ..........................................................................................................................................................3
1.1. Installation ..................................................................................................................................................3
2. Version History ......................................................................................................................................................4
2.1. Version 1.0 ..................................................................................................................................................4
3. Functional Overview of Current Version ...............................................................................................................5
3.1. Check if Page Exists .....................................................................................................................................5
3.2. Decrypt .......................................................................................................................................................6
3.3. Encrypt ........................................................................................................................................................7
3.4. Extract Page Range .....................................................................................................................................9
3.5. Extract Single Page ....................................................................................................................................10
3.6. Get Images from Page Range ....................................................................................................................11
3.7. Get Images from Single Page ....................................................................................................................12
3.8. Get Images from Whole PDF .....................................................................................................................13
3.9. Get Number of Pages ................................................................................................................................14
3.10. Get Page Coordinates ...............................................................................................................................15
3.11. Get Text from Page Range ........................................................................................................................16
3.12. Get Text from Page Range (Area) ..............................................................................................................17
3.13. Get Text from Page Range to Collection ...................................................................................................18
3.14. Get Text from Page Range to Collection (Area) ........................................................................................19
3.15. Get Text from Single Page .........................................................................................................................20
3.16. Get Text from Single Page (Area) ..............................................................................................................21
3.17. Get Text from Whole PDF .........................................................................................................................22
3.18. Get Text from Whole PDF (Area)...............................................................................................................23
3.19. Get Text from Whole PDF to Collection ....................................................................................................24
3.20. Get Text from Whole PDF to Collection (Area) .........................................................................................25
3.21. Merge All PDFs in a Directory....................................................................................................................26
3.22. Merge Selected PDFs ................................................................................................................................27
3.23. Split Page Range to Single Pages ...............................................................................................................28
3.24. Split Whole PDF to Single Pages ................................................................................................................29
© SRI Infotech Inc., 1500 Providence Highway Unit 32, Norwood MA 02062 Page 2 of 29
1. Introduction
SRI – Utility – PDF is an easy to use Blue Prism VBO that allows the user to interact with PDF documents without the
need to open them. It brings the possibility to: extract text, images and pages from PDF documents, split, merge,
encrypt and decrypt them.
1.1. Installation
This VBO requires itextsharp.dll to be stored stored in Blue Prism Automate directory (default location: C:\Program
Files\Blue Prism Limited\Blue Prism Automate). The dll file is provisioned inside asset package.
Once dll is in place, you can import the object and use it without need for any modification.
2. Version History
2.1. Version 1.0

This is the initial version of this VBO. The document will be amended with a change history in future.
3. Functional Overview of Current Version
This VBO utilizes open source library iTextSharp 5.5.13.1.
The runmode of this business object is "background".
3.1. Check if Page Exists
Determines if page with given number exists within supplied PDF document.
Returns true if exists, false if it doesn't.
Parameter Direction Data Type Description
File Path In Text Full path to PDF document
Page
In Number Index of PDF document page
Number
OPTIONAL: User or Owner password (necessary if the document is protected;

Password In Password user password will not work if owner of the document restricted this kind of
activity in security properties); UTF-8 encoding
Page
Out Flag True if page exists
Exists
3.2. Decrypt
Removes password protection from supplied PDF document, does not alter security properties.
If Output File Path will be left blank or will contain path to parent document, then new document won't be created
and action will be performed on parent document.
Output File OPTIONAL: Full path to the new document destination (If left blank, then
In Text
Path parent document will get decrypted)
Owner
In Password Password to manage properties of the document; UTF-8 encoding
Password
3.3. Encrypt
Sets security properties of PDF document and encrypts it with given passwords.
Encryption Standard: AES 256.
If Output File Path will be left blank or will contain path to parent document, then new document won't be created
and action will be performed on parent document.
User Password is used for opening the document. If this parameter will be left blank, then PDF software won’t prompt
for password when opening.
Owner Password is used for managing the document (changing properties, etc.). If this parameter will be left blank,
then random string will be generated and used to encrypt the file.
If Encrypt is set to true, then at least one of new passwords must be provided.
IMPORTANT NOTE: Permissions given in context with encryption never cause the user to be able to do more than he
could do with the unencrypted document. These permissions only decide how much less a regular document user
(i.e. a person opening the PDF with the user password) is allowed to do compared with the document owner (i.e. a
person opening the PDF with the owner password). When opening an unencrypted document, always the full owner
permissions are assumed.
As follows, in a PDF viewer that does not allow certain operations even to a document owner, setting the matching
Allow* flags during encryption will not make the PDF viewer allow those operations to some user. The document
restrictions summary in security section in PDF reading software don't merely reflect the state of the permissions of
the document set during encryption. Instead they are indeed a summary based on numerous inputs not all of which
depend on the document itself:
1. The operations the program variant allows by default.
2. Additional operations allowed via a usage rights signature in the document.
3. Restrictions introduced by permissions not given during encryption.
OPTIONAL: Full path to the new document destination (If left blank, then
Output File Path In Text
parent document will get encrypted)
Password In Password Password to manage properties of the document; UTF-8 encoding
New User OPTIONAL: Password used to open the document; at least one of the
In Password
Password passwords (user, owner) must be provided;
New Owner OPTIONAL: Password to manage properties of the document; at least one
In Password
Password of the passwords (user, owner) must be provided
OPTIONAL: (True is default value); The user is permitted to print the
Allow Printing In Flag
document
OPTIONAL: (True is default value); The user is permitted to modify the

Allow Modify
In Flag contents—for example, to change the content of a page, or insert or
Contents
remove a page
OPTIONAL: (True is default value); The user is permitted to copy or

Allow Copy In Flag otherwise extract text and graphics from the document, including using
assistive technologies such as screen readers or other accessibility devices
Allow Modify OPTIONAL: (True is default value); The user is permitted to add or modify
In Flag
Annotations text annotations and interactive form fields
Allow Fill In In Flag OPTIONAL: (True is default value); The user is permitted to fill form fields
Allow OPTIONAL: (True is default value); The user is permitted to extract text
In Flag
Screenreaders and graphics for use by accessibility devices
OPTIONAL: (True is default value); The user is permitted to insert, remove,

Allow Assembly In Flag and rotate pages and add bookmarks. The content of a page can’t be
changed unless the permission allow modifying contents is granted too
Allow Degraded OPTIONAL: (True is default value); The user is permitted to print the
In Flag
Printing document, but not with the quality offered by allow printing
3.4. Extract Page Range
Extracts given range of pages from supplied PDF document and saves it as separate PDF document.
If encrypt will be set to true, this action will copy security properties of parent document to extracted one (at least
one new password must be provided in this case).
Start Page In Number Index of PDF document's page from which the extraction will begin
End Page In Number Index of PDF document's page at which the extraction will end
Output File
In Text Full path to the extraction destination
Path

OPTIONAL: (False is default value); True for copying security properties of

Encrypt In Flag
parent document
New User OPTIONAL: Password used to open the document; if encrypt is set to true,
In Password
Password then at least one password (user, owner) must be provided
New Owner OPTIONAL: Password to manage properties of the document; if encrypt is set
In Password
Password to true, then at least one password (user, owner) must be provided
3.5. Extract Single Page
Extracts given page from supplied PDF document and saves it as separate PDF document.
Page Number In Number Index of PDF document page
Output File
In Text Full path to the extraction destination
Path


Encrypt In Flag
parent document
In Password
In Password
3.6. Get Images from Page Range
Gets images from each page from given range of pages of supplied PDF document.
Returns 5-column collection:
Page Number <Text>: Index of page from which image was extracted.
Image <Image>: Actual image.
Extension <Text>: Image file format.
Status <Text>: "Success" if image was processed successfully, "Error" if not processed successfully.
Error Message <Text>: Error description if error occurred.
If Ignore Errors is set to true, then in case of an error, it will continue processing next PDF pages, otherwise it will
terminate.
Start Page In Number Index of PDF document's page from which the images extraction will begin
End Page In Number Index of PDF document's page at which the images extraction will end

Ignore OPTIONAL: (False is default value); True to continue processing next PDF pages
In Flag
Errors in case of an error
PDF Pdf Images extracted from the document (Page Number; Image; Extension,
Out Collection
Images Status, Error Message)
3.7. Get Images from Single Page
Gets images from given page of supplied PDF document.
terminate.
Page
Number

In Flag
Out Collection
3.8. Get Images from Whole PDF
Gets images from each page of supplied PDF document.
terminate.

In Flag
Out Collection
3.9. Get Number of Pages
Gets number of pages in supplied PDF document.

Number of
Out Number Count of all pages within given document
Pages
3.10. Get Page Coordinates
Gets coordinates of given page from supplied PDF document.
IMPORTANT NOTE: Pdf document coordinates start from bottom-left corner (0,0) and go right on x axis and up on y
axis.
Page
Number

Left Out Number Start x value of PDF page
Bottom Out Number Start y value of PDF page
Width Out Number Width of PDF page (left to right)
Height Out Number Height of PDF page (bottom to top)
3.11. Get Text from Page Range
Gets text from given range of pages of supplied PDF document.
EXTRACTION STRATEGIES:
0: no strategy at all, characters are being read from left to right, top to bottom (default option).
1: simple extraction strategy - A simple text extraction renderer. This renderer keeps track of the current Y position
of each string. If it detects that the y position has changed, it inserts a line break into the output. If the PDF renders
text in a non-top-to-bottom fashion, this will result in the text not being a true representation of how it appears in
the PDF. This renderer also uses a simple strategy based on the font metrics to determine if a blank space should be
inserted into the output.
2: location extraction strategy - A text extraction renderer that keeps track of relative position of text on page the
resultant text will be relatively consistent with the physical layout that most PDF files have on screen. This renderer
keeps track of the orientation and distance (both perpendicular and parallel) to the unit vector of the orientation.
Text is ordered by orientation, then perpendicular, then parallel distance. Text with the same perpendicular distance,
but different parallel distance is treated as being on the same line.
Start Page In Number Index of PDF document's page from which the text extraction will begin
End Page In Number Index of PDF document's page at which the text extraction will end
Extraction OPTIONAL: (0 is default value); 0 - no extraction strategy; 1 - simple strategy; 2

In Number
Strategy - location strategy

PDF Text Out Text Text read from PDF
3.12. Get Text from Page Range (Area)
Gets text from specified area from each page of given range of pages of supplied PDF document.
This action uses Location Extraction Strategy.
IMPORTANT NOTE 1: This action will read block of text that is within specified area, however it will always read the
whole block, even if it ends outside of specified area.
IMPORTANT NOTE 2: Pdf document coordinates start from bottom-left corner (0,0) and go right on x axis and up on
y axis.
Start
In Number Index of PDF document's page from which the text extraction will begin
Page
Start x In Number Leftmost position of the area
Start y In Number Bottom position of the area
End x In Number Rightmost position of the area
End y In Number Top position of the area
OPTIONAL: User or Owner password (necessary if the document is protected; user

Password In Password password will not work if owner of the document restricted this kind of activity in
security properties); UTF-8 encoding
3.13. Get Text from Page Range to Collection
Gets text from given range of pages of supplied PDF document and returns it as a 2-column collection (Page
Number<Number>, Text<Text>), where each row contains text from appropriate PDF page.
Start Page In Number Index of PDF document's page from which the text extraction will begin
Extraction OPTIONAL: (0 is default value); 0 - no extraction strategy; 1 - simple strategy;

In Number
Strategy 2 - location strategy

PDF Text Out Collection Collection of texts from PDF pages
3.14. Get Text from Page Range to Collection (Area)
Gets text from specified area from each page of given range of pages of supplied PDF document and returns it as a
2-column collection (Page Number<Number>, Text<Text>), where each row contains text from appropriate PDF page.
y axis.
Start
In Number Index of PDF document's page from which the text extraction will begin
Page

3.15. Get Text from Single Page
Gets text from given page of supplied PDF document.
Page
In Number Index of PDF document's page
Number

In Number

3.16. Get Text from Single Page (Area)
Gets Text from specified area of given page of supplied PDF document.
y axis.
Page
Number

3.17. Get Text from Whole PDF
Gets text from whole supplied PDF document.

In Number

3.18. Get Text from Whole PDF (Area)
Gets text from specified area from each page of supplied PDF document.
y axis.

3.19. Get Text from Whole PDF to Collection
Gets text from whole supplied PDF document and returns it as a 2-column collection (Page Number<Number>,
Text<Text>), where each row contains text from appropriate PDF page.
Extraction OPTIONAL: (0 is default value); 0 - no extraction strategy; 1 - simple strategy;

In Number
Strategy 2 - location strategy

3.20. Get Text from Whole PDF to Collection (Area)
Gets text from specified area from each page of supplied PDF document and returns it as a 2-column collection (Page
Number<Number>, Text<Text>), where each row contains text from appropriate PDF page.
y axis.

3.21. Merge All PDFs in a Directory
Merges all PDF documents found in given directory into one document.
Provisioned password will be used to open each file that is password protected.
If Ignore Merge Errors is set to true, then in case of an error, it will continue merging next documents, otherwise it
will terminate.
This action returns collection with merge status (File Path<Text>; Status<Text>, Error Message<Text>).
1st column - File Path: contains full paths to the PDF files that were merged.
2nd column - Status: returns “Success” or “Error” for each document.
3rd column - Error Message: will contain error description for all documents with “Error” status.
Output of this action is useful only when Ignore Merge Errors is set to true.
Directory
In Text Path to directory with PDF files to be merged
Path
Output File
In Text Full path to the merge destination
Path

Ignore OPTIONAL: (False is deafult value); True, to continue merging next documents
In Flag
Merge Errors in case of an error
Merge
Out Collection Status of merge operation (File Path; Status; Error Message)
Status
3.22. Merge Selected PDFs
Merges supplied PDF documents into one document and saves it at given path.
1st column from File Paths collection will be treated as file paths to PDF documents to be merged. Needs to be of
type <Text>.
2nd column (if exists) will be treated as corresponding passwords to the files. Needs to be of type <Password> or
<Text>.
Any other column will be disregarded.
If Ignore Merge Errors is set to true, then in case of an error, it will continue merging next documents, otherwise it
will terminate.
This action returns collection with merge status (File Path<Text>; Status<Text>, Error Message<Text>).
1st column - File Path: contains full paths to the PDF files that were merged.
2nd column - Status: returns “Success” or “Error” for each document.
3rd column - Error Message: will contain error description for all documents with “Error” status.
Output of this action is useful only when Ignore Merge Errors is set to true.
(1st column for File Paths; 2nd column for passwords - UTF-8 encoding; any
File Paths In Collection
other columns will be disregarded)
Output File
In Text Full path to the merge destination
Path
Ignore Merge OPTIONAL: (False is default value); True, to continue merging next
In Flag
Errors documents in case of an error
Merge Status Out Collection Status of merge operation (File Path; Status; Error Message)
3.23. Split Page Range to Single Pages
Extracts each page from given range of pages of supplied PDF document and saves it as separate PDF document.
Naming convention of extracted documents: 'Parent Document File Name' & '_Page Number' & '.pdf'.
Start Page In Number Index of PDF document's page from which the extraction will begin
End Page In Number Index of PDF document's page at which the extraction will end
Output
In Text Path to directory where extracted documents will be saved
Directory


Encrypt In Flag
parent document
In Password
In Password
3.24. Split Whole PDF to Single Pages
Extracts each page from supplied PDF document and saves it as separate PDF document. Naming convention of
extracted documents: 'Parent Document File Name' & '_Page Number' & '.pdf'.
Output
In Text Path to directory where extracted documents will be saved
Directory


Encrypt In Flag
parent document
In Password
In Password

SRI-Utility-PDF v1.0 Technical Documentation

Uploaded by

Copyright:

Available Formats

You might also like

SRI-Utility-PDF v1.0 Technical Documentation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SRI-Utility-PDF v1.0 Technical Documentation

Uploaded by

Copyright:

Available Formats

SRI – Utility - PDF

BLUE PRISM VBO TECHNICAL DOCUMENTATION

automation@sriinfotech.com | PL: +48 58 742 53 56 | US: (888) 513-0114

2.1. Version 1.0

This VBO utilizes open source library iTextSharp 5.5.13.1.

The runmode of this business object is "background".

3.1. Check if Page Exists

Returns true if exists, false if it doesn't.

Parameter Direction Data Type Description

File Path In Text Full path to PDF document

OPTIONAL: User or Owner password (necessary if the document is protected;

Parameter Direction Data Type Description

File Path In Text Full path to PDF document

Encryption Standard: AES 256.

1. The operations the program variant allows by default.

2. Additional operations allowed via a usage rights signature in the document.

3. Restrictions introduced by permissions not given during encryption.

Parameter Direction Data Type Description

File Path In Text Full path to PDF document

Password In Password Password to manage properties of the document; UTF-8 encoding

OPTIONAL: (True is default value); The user is permitted to modify the

OPTIONAL: (True is default value); The user is permitted to copy or

OPTIONAL: (True is default value); The user is permitted to insert, remove,

Encryption Standard: AES 256.

Parameter Direction Data Type Description

File Path In Text Full path to PDF document

OPTIONAL: User or Owner password (necessary if the document is protected;

OPTIONAL: (False is default value); True for copying security properties of

Encryption Standard: AES 256.

Parameter Direction Data Type Description

File Path In Text Full path to PDF document

Page Number In Number Index of PDF document page

OPTIONAL: User or Owner password (necessary if the document is protected;

OPTIONAL: (False is default value); True for copying security properties of

Returns 5-column collection:

Parameter Direction Data Type Description

File Path In Text Full path to PDF document

OPTIONAL: User or Owner password (necessary if the document is protected;

Gets images from given page of supplied PDF document.

Returns 5-column collection:

Parameter Direction Data Type Description

File Path In Text Full path to PDF document

OPTIONAL: User or Owner password (necessary if the document is protected;

Gets images from each page of supplied PDF document.

Returns 5-column collection:

Parameter Direction Data Type Description

File Path In Text Full path to PDF document

OPTIONAL: User or Owner password (necessary if the document is protected;

Gets number of pages in supplied PDF document.

Parameter Direction Data Type Description

File Path In Text Full path to PDF document

OPTIONAL: User or Owner password (necessary if the document is protected;

Gets coordinates of given page from supplied PDF document.

Parameter Direction Data Type Description

File Path In Text Full path to PDF document

OPTIONAL: User or Owner password (necessary if the document is protected;

Left Out Number Start x value of PDF page

Bottom Out Number Start y value of PDF page