Downloading PDFseam

06/03/2024, 22:02 Downloading “undownloadable” web PDFs with Fiddler.
| by A B | Medium
Downloading “undownloadable” web PDFs

with Fiddler.
A B · Follow
7 min read · Jul 13, 2018
Listen Share
I was once teaching a course in the area of backend software engineering. I didn’t
own the course material, my duties included going over and presenting the slide
deck that I had been provided by the course coordinator, answering any outstanding
questions from the class, being on time, having lunch, and timely getting lost at 5:30
pm. At the end of the course, naturally, the students asked me to share the slide
deck with them so they could go over it on their own. And that’s when the issue
revealed itself — the course slides were provided to me via a secure document
sharing platform, let’s call it PDFLord [I won’t mention the actual name for the sake
of… reasons], which imposed downloading and printing restrictions on all the
course PDFs. So, unfortunately, the students had to leave the class empty-handed.
However, something didn’t seem right in my mind — if you can see the document on
your screen, surely its source is hiding somewhere in the files downloaded/cached
by your browser, and consequently the download restriction is artificial in a sense.
In this article I will show you a method to overcome these restrictions that I
discovered in the two days following the course. My tutorial will assume MacOS
(High Sierra) development environment, Chrome browser, and PDFLord platform,
but similar steps could be undertaken for other operating systems and other
document sharing platforms.
To begin with, let’s list the reasons why PDFLord was a bane of my existence:
https://medium.com/@peacefuleast/downloading-undownloadable-web-pdfs-with-fiddler-32094da02285 1/20
06/03/2024, 22:02 Downloading “undownloadable” web PDFs with Fiddler. | by A B | Medium
1. As mentioned before, the PDFs had downloading and printing restrictions (as
indicated by the grayed out icons in the top right corner).
2. The PDFs were copy-protected, meaning I could not select any text (as indicated
by the “Protected File” pop-up on mouse click).
3. The PDFs were unsearchable, meaning I had to memorize the page numbers of
all sections in the course that I wanted to quickly navigate to.
4. There was no fullscreen or present button.
My first intuition was to examine the page source files. I will skip the parts where I
was randomly clicking through all possible directories and folders while looking for
the right files, and instead will go straight to the ones relevant to this tutorial. You
can press Command+Shift+C to bring up the developer console in Chrome. Then
open the Sources tab.
As you can see there is a pdflord.com directory, with a plugins folder under assets. If
you scroll down, you will find a folder called pdfjs, which contains two files — pdf.js
and viewer.js. It turns out that PDFLord is using an open-source PDF rendering and
parsing javascript library by Mozilla, which you can find here
https://mozilla.github.io/pdf.js/
Let’s dig through the viewer.js file a bit more. After some inspection we find a
method which sounds like it deals with page rendering:
function webViewerPageRendered(evt)
Let’s add a breakpoint on line 2141 inside this method right after the pageView
variable and reload the page. Our goal is to examine what the object pointed at by
this variable represents.
Surely, now we can just write a script to go over every page in the PDF, extract the
image data arrays, convert them to jpegs, and end up with a sequence of images of
the PDF file. To be honest, I wasn’t quite satisfied with this finding — I would still
not be able to select any text or search through the images. I was looking for a better
way.
If we examine the viewer.js file a bit more, we find another interesting function:
In particular, there is this very intriguing line which looks like it deals with
restricting downloads:
if (PDFViewerApplication &&
PDFViewerApplication.appConfig.allowdownload) {
And then we also find the following sequence which deals with binding events to
button click listeners. It’s amusing how the “print” and “download” events are very
sloppily commented out, most likely to handle print and download logic in a
different part of the code.
At this point our action plan is clear:
1. We will rebind one of the buttons to serve as a download button (simply

uncommenting the download event listener didn’t work, I didn’t dig too much
into why).
2. Change the download permissions logic to not require allowdownload.
3. ???
4. Proceed to downloading the PDF.
To make changes to javascript files returned by a web page we need a man-in-the-

middle proxy server. For this purpose, we will be using Fiddler — a free web
debugging proxy by Telerik https://www.telerik.com/fiddler. Fiddler was originally
developed as a Windows application, and only recently got ported to Mac. On
MacOS it runs using Mono — an open-source implementation of .NET Framework.
You can follow this tutorial https://www.telerik.com/blogs/introducing-fiddler-for-
os-x-beta-1 to install Mono and Fiddler. The only difference is that Fiddler 64bit
version doesn’t work on OS X, so you would need to use this command to start
Fiddler and avoid errors:
mono --arch=32 Fiddler.exe
Most websites nowadays use https, so we need to configure Fiddler to correctly

capture and decrypt https traffic. Open Tools->Options->HTTPS, and check the
Decrypt HTTPS Traffic checkbox.
Since Fiddler acts as a proxy, browser traffic gets redirected to it. All browsers know
how to protect user data from man-in-the-middle attacks, so they don’t let the traffic
be delivered to actors whose certificates are not trusted. To bypass this constraint
we click on Actions->Export Root Certificate To Desktop. Next, open Keychain Access —
MacOS app that manages certificates — and drag-n-drop the generated certificate
from your desktop to the Keychain window. The certificate will appear as
DO_NOT_TRUST_FiddlerRoot. Double click on it, and in the new window select
Always Trust.
The final step is to actually redirect the traffic from Chrome to Fiddler. Open System
Preferences->Network->Advanced->Proxies. Check Web Proxy and Secure Web Proxy,
and for both set the host to 127.0.0.1 and the port to 8888. Click Ok, then Apply.
You should now start seeing the traffic from your browser in the main Fiddler
window. If you don’t see anything, try using an Incognito Window.
Now the fun part: hacking the javascript files and serving them in place of the
original files. Download (or copy paste) the viewer.js file, open it in your favorite
editor, and replace line 10279 with:
items.zoomIn.addEventListener('click', function() {
//eventBus.dispatch('zoomin');
eventBus.dispatch('download');
});
In short, we are binding the download event to the zoom-in button. Next, remove
`PDFViewerApplication.appConfig.allowdownload` from lines 1475 and 5067 (and
anywhere else in the file for that matter):
if (PDFViewerApplication)
Our substitute viewer.js file is ready for deployment. Find and select the viewer.js
resource in Fiddler (you might want to stop capturing traffic to prevent the window
from refreshing by disabling File->Capture Traffic).
Actual name of the website replaced with pdflord.
Then in the panel on the right select AutoResponder->Add Rule. In the bottom drop-
down menu choose Find File, select your substitute viewer.js file and click Save. Make
sure both Enable rules and Unmatched requests passthrough are checked.
Open in app Sign up Sign in
Search
Aaaaaand… drum roll… we are done! We are ready to download our PDF.
Open your Chrome window with the PDF viewer. With your debugging console
being open, right click the refresh button and click on Empty Cache and Hard Reload.
Don’t forget to reenable Capture Traffic in Fiddler.
Emptying the cache is necessary to not let Chrome pick up the original version of
viewer.js and instead make it download it again from the web. The downloaded
javascript file gets intercepted by Fiddler and replaced with our custom one.
Now, whenever you click on the Zoom In button (“+”), your PDF will get
downloaded. Great success!
Final thoughts and lessons learned:
When any data reaches your computer, there is absolutely no way to guarantee
its complete integrity.
Basing your business model on a premise that the data you share is fully secure
and protected is a terrible idea.
Hope y’all who got this far had as much fun with this tutorial as I did when fiddling
with this challenge.
Disclaimer: use at your own risk. Make sure you are not breaching any contracts
with your document providers. There is a very obvious potential harm to the
business models of the secure document sharing companies.
JavaScript Fiddler Hacking Pdf Chrome
Follow
Written by A B
46 Followers
I do things.
More from A B
AB
How I Earned $1000 on Two Freelance Projects in One Week.

Enter the Freelancer.com Ecosystem as a Complete Beginner.
8 min read · Nov 4, 2018
1 1
AB
Preparing for Coding Interviews Like Nobody Told You Before.

Or how to use LeetCode the right way.
12 min read · Nov 11, 2018
See all from A B
Recommended from Medium
George Stavrakis in Towards Data Science
Extracting text from PDF files with Python: A comprehensive guide

A complete process to extract textual information from tables, images, and plain text from a
PDF file
· 17 min read · Sep 22, 2023
1.3K 23
Artturi Jalli
I Built an App in 6 Hours that Makes $1,500/Mo

Copy my strategy!
· 3 min read · Jan 24, 2024
12.1K 146
Lists
Stories to Help You Grow as a Software Developer

19 stories · 872 saves
General Coding Knowledge

Living Well as a Neurodivergent Person

Generative AI Recommended Reading

François
Record Audio in JS and upload as wav or mp3 file to your backend

Learn how to record audio in Javascript and save the file as mp3 or wav on your local disk or
Amazon S3.
5 min read · Jan 26, 2024

14
Cloudmersive
How to Convert PDF to Text with OCR using Node.js

When our PDF files are rasterized (bitmap images instead of vector images), we need OCR
services to extract plain text from the document.
3 min read · Nov 14, 2023
50
Kunho Lee
Exploring PDF — Basic Object

PDF is composed of basic objects: Boolean, Numeric, String, Name, Array, Dictionary, Stream,
and Null object + Indirect object.
2 min read · Sep 28, 2023
Sandeep Kumar
Pdf Upload and Pdf View

2 min read · 6 days ago
See more recommendations

Downloading PDFseam

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Downloading PDFseam

Uploaded by

Copyright:

Available Formats

06/03/2024, 22:02 Downloading “undownloadable” web PDFs with Fiddler.

Downloading “undownloadable” web PDFs

4. There was no fullscreen or present button.

At this point our action plan is clear:

1. We will rebind one of the buttons to serve as a download button (simply

2. Change the download permissions logic to not require allowdownload.

4. Proceed to downloading the PDF.

To make changes to javascript files returned by a web page we need a man-in-the-

mono --arch=32 Fiddler.exe

Most websites nowadays use https, so we need to configure Fiddler to correctly

Actual name of the website replaced with pdflord.

Open in app Sign up Sign in

Actual name of the website replaced with pdflord.

Actual name of the website replaced with pdflord.

Final thoughts and lessons learned:

JavaScript Fiddler Hacking Pdf Chrome

How I Earned $1000 on Two Freelance Projects in One Week.

8 min read · Nov 4, 2018

Preparing for Coding Interviews Like Nobody Told You Before.

12 min read · Nov 11, 2018

See all from A B

Recommended from Medium

George Stavrakis in Towards Data Science

Extracting text from PDF files with Python: A comprehensive guide

· 17 min read · Sep 22, 2023

I Built an App in 6 Hours that Makes $1,500/Mo

· 3 min read · Jan 24, 2024

Stories to Help You Grow as a Software Developer

General Coding Knowledge

Living Well as a Neurodivergent Person

Generative AI Recommended Reading

Record Audio in JS and upload as wav or mp3 file to your backend

5 min read · Jan 26, 2024

How to Convert PDF to Text with OCR using Node.js

3 min read · Nov 14, 2023

Exploring PDF — Basic Object

2 min read · Sep 28, 2023

Pdf Upload and Pdf View

2 min read · 6 days ago

See more recommendations

You might also like