Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 7

https://zhiyunq.medium.

com/how-to-look-for-ideas-in-computer-science-research-
7a3fa6f4696f

How To Look For Ideas In Computer Science

(KEY: this is just a set of domain specific thinking tools! nail down, get
practice, done!)

1. found a set of challenges that prevent students from proceeding in CS research.


(so) offering some tips as guidance.

2. for people with no research experience pre PhD course, it is scary to think
where to start.
2 options
1. you are in a super productive group. A lot of ideas are thrown at you.
advantage: you can get your first research project going and get a good
publication.
disadvantage: you don't learn the most difficult part: coming up with a new idea
2. you get no concrete ideas, just a high level direction and some papers.
From this you have to come up with a research idea.
advantage: (_ not explicitly spelt out, but you are tackling 'the key problem' from
the get go)
disadvantage: make or break situation. Either you manage to figure out a good idea
to work on XOR waste the first two years of a PhD, quit. 50% drop out rate

The author thinks it is important to start the training process in how to look for
a research idea as soon as possible.
Else big deficiency, won't be able to lead a research group effectively etc.

Tip 1: Learn to read papers and develop your taste.

- one way to generate ideas == read papers and get inspired.


- take seminar classes where you have to read a tonne of papers.
- write paper responses to express opinions, critiques, any constructive
thoughts
- (initially overwhelming to read) 3-4 papers per class.
- common misconception == understand *all* technical details of a paper.
- instead intended aim = learn to appreciate and criticize research ideas,
i.e learn to answer questions like
- why is a paper good or bad?
- what makes paper X interesting?
- [How To Read A Paper By S Keshav] TBD
- find out what types of papers interest you most. Find out why. This helps
you find your research taste, and narrowing down your scope in finding a research
area.
- classify papers into 'research domains' (sub area buckets)- e.g for
cybersecurity the 'buckets' are
- operating systems (hacks),
- network protocols,
- software,
- hardware etc
- identify research *styles* (_ buckets for overall approaches in
papers) .
e.g for cyber security
- novel attacks and exploitation techniques
- analysis of emerging systems
- analysis of (emerging) algorithms
- defenses
- measurements
etc

- identify research *tools* used in a paper. For cybersecurity (note:


these are more approaches, and so mostly mental, rather than specific software or
hardware tools)
- manual analysis
- reverse engineering
- program analysis
- formal methods
- hardware based system design (?)
- data driven approach - machine learning , AI etc

Example: in early stage of career, Z was fascinated by novel attacks that break
state-of-the-art defenses.
(because, see above narrow down *why* interested, Z felt such papers are)
creative, elegant, give great satisfaction because you discover flaws no one else
sees.
(KEY) When Z read papers like these he asked "*how* do these people find these
flaws?" (KEY)"what skills are needed to conduct such research?"
Initially, following his advisor's lead, Z focused on networking domain security
problems (e.g: TCP security flaws). Z did *not* (_ in the course of this research)
get a chance to acquire solid techniques such as program analysis, formal methods,
ML etc. About the time he graduated his research tastes changed, and he (KEY)
realized that it was not feasible to sustainably and scalably solve security
problems without automated techniques in his tool box. So he learned program
analysis, model checking, ML etc which worked out well for him.

(KEY) identify your research taste and develop a unique (_ skill set and) path
accordingly!

Tip 2: Recognize Patterns Of Developing Research Ideas.

- every paper has a unique story behind it *but* there are patterns in how they
come about
- [ref to Guo's patterns of research blog post]
- 'patterns' - how the ideas comes about
- once you know a few patterns you can come up with new ideas easily. Still
need to triage the value of an idea before committing.

Pattern 1: Fill in the blank

Example

Domain/Technique | Static Analysis | Dynamic Analysis


| |
Use After Free | --- | X
Out-of-Bounds Analysis | X | X

a simple pattern from Robert Shum. Very simple conceptually.

1. Read a few papers


2. Jot down differences between papers in terms of
- assumptions
- properties/guarantees offered
- technologies data sets etc
(_ the basic idea seems to be to generate a set of 'tags' , each paper
having a set of 'values' (including null) for each tag. )
3. draw a (multi dimensional!) table with these tags as column/row headers.
Wherever there is a blank spot, there is a research opportunity.

(_ not explicitly stated, but in the table above, there seems to be a research
opportunity containing a combination of 'use-after-free' *and* 'static analysis')

Text "people may have applied static analysis to automatically find certain
vulnerabilities, but no one has applied dynamic analysis yet"
"if a certain type of vulnerability has been studied (_ via a specific
technique X), you can potentially work on a different type (_with the tichnique X)"
(KEY) Fine grained tables give you more opportunities.

(KEY INSIGHT: This 'attributes grid' can be created for 'things' other than papers.
ML Frameworks, for example. or Books to code up.)

Two Ways To Map Out Dimensions


1. Read a bunch of papers, (on similar topics) and look for differences (the
technique implied above)
2. Read survey papers - these typically survey a space and generates several
'dimensions' and classify papers/techniques etc into 'tables' formed from these
attributes.

Pattern 2: Expand

Here *because of your prior work* you see 'dimensions' that may not be visible to
others. In other words you have done work which can be 'tagged' now take one (or
more) of these tags, and generate alternatives, so you are expanding the dimension,
and also your previous work. (alternatively this can be seen as dropping one of the
limiting tags from your previous work thus 'expanding')
So you can create a unique 'grid' with the 'dimension' you are using that is not
visible/partially visible/differently visible to others.

e.g: Z's Usenix security paper on a strong side channel TCP attack.
assumptions for *this* paper == an unprivileged malware has already been
installed on the victim's phone
for the *next* paper = eliminate this requirement, (_ of already established
malware) and this led to discovery of brand new sidechannel.

so dimension == "attack requirements" and the dimension is Malware Installed


| Firewall Operational | Non (whatever this means)

e.g: Z's CCS 20 paper resulted from taking (finding/building) a side channel
(attack) experience from TCP to UDP
. So the dimension here is about shifting network protocols (but keeping the
attack concept fixed)

Pattern 3: Build a Hammer and Look For Nails

idea: if you have a unique technique, expertise, system, or even dataset, (*that no
one can easily replicate*) look for interesting problems to solve with it.
e.g: Peter Chen's group at the University Of Michigan has a strong expertise in
virtual machines, and leveraged it to build many interesting things on top. e.g:
first vm replay and record capability. This has important applications in debugging
and retrospective intrusion analysis. Published a number of papers at top
conferences wrt determinism guarantees,

e.g: Z's group: expertise in network side channels, a small area without too much
competition, --> a series of papers from 2012 to 2020.
basic idea: once expertise is built up, you can more easily find new problems to
solve.

e.g: USCB: state of art binary analysis framework- angr- originally developed for
DARPA grand challenge competition. Open Sourced.
since well engineered and designed, became popular among academia. authors
leveraged system to develop follow up projects.

e.g: CAIDA at UCSD has a unique dataset that no one else has. Used it to publish
many unique measurement studies.

This is an impactful and sustainable way of conducting research. But otoh it can be
time consuming to build up the expertise, dataset etc. Also such initiatives take
vision, long term planning, etc , often beyond a single student's ability. Also a
few groups dominate specific spaces. It may be difficult to suprass them, unless
you have a unique insight. (In such cases?) you should try to identify something
where a lot of people need something, but no solution exists yet (_and you can
provide it?).

Pattern 4: Start Small, Then Generalize

(picture of young plant)

Often, a research idea starts with a small observation. you need to figure out if
it can be developed into a full idea worth publishing.

Positive signs to look for


1. initial observation (however small) is very intriguing and surprising when
you encounter it.
2. when you dig deeper you find it is rooted in something foundational and
cannot be explained by well known/established concepts.
3. the observation is unlikely to be one-off, i.e, there isa bigger space
behind the original observation, and many similar situations you can investigate.

e.g:
(Z and his team) poking around different interfaces exposed through filesystems
(?) to potentially malicious apps on Android.
Student stumbled upon a device file "/dev/ion" which looked interesting because it
allowed apps to allocate memory in "pre existing heaps" and map them into user
space, but instead of returning "zeroed pages" the returned pages contain leftover
data from prior use, a potential information leakage security vulnerability. This
was sign 1.,

By itself, this is not enough for a research project.

But on digging deeper, they found that though the bug *type* is not new, it was
more deeply rooted than initially apparent. (sign 2)
Specifically, the introduction of the "/dev/ion" interface unexpectedly exposed
memory returned by APIs *used by internal APIs used by OS kernels*.
Unlike APIs exposed to user space (where zeroing of returned memory is a
requirement) such 'internal' APIs do not zero newly allocated memory because of
performance reasons.
Even worse, different Android smartphones *customize the implementation* of
/dev/ion allowing a bigger space of investigation. (sign 3)

paper link: http://www.cs.ucr.edu/~zhiyunq/pub/ccs16_ion.pdf

Pattern 5: Reproduction Of Prior Work

From the previous pattern: ? == how do I make these small observations or


discoveries in the first place?
One way is to try to duplicate the result of a published paper.

(KEY) what is reported in a paper is *not* what you observe when you try to
replicate it.
Possible Reasons:
1. inadvertent mistakes made by authors of the paper
2. some results may not be 100% reproducible (say you don't have the exact
dataset that the authors used)
3. biased benchmarks, or datasets that favor the central thesis/method proposed
in the paper.
If you identify *important* discepancies/shortcomings, often means room for
improvement.

Even when you reproduce the work 100%, there are often other insights and side
discoveries.
(INSIGHT: this can work with reproducing existing *codebases* too e.g femtolisp, or
lua interpreter)

e.g: Z's students were asked to reproduce paper X. Results were much more negative
than reported in the paper. This led to development of a new method to bypass the
limitations of the original paper.
(INSIGHT: being *unable* to reproduce results can also be useful)

Pattern 6: External Sources: Industry, News Feed etc

if you work in a practical area, much is going on in industry, and outside academia
in general.
take opportunities to connect to industry (_ generalize: outside your own area)
learn their pressing needs and pain points. These are great sources of research
ideas.

e.g: Z, by talking to industry folks, identified a 'patch problem'


When patches is committed to the Linux kernel, downstream maintainers (say of
Ubuntu LTS, or Android) have to figure out if this patch is relevant to them, and
if so backport it.

This is a tedious and errorprone process. (<-- PROBLEM) Often important security
patches are delayed or missed (PROBLEM)
Even worse (PROBLEM!) it is difficult for owners of the downstream kernel branches
to audit and look for missing patches.
In the Android world, for example, most vendors do not provide a history of their
kernel source (a git repository) . (PROBLEM)

therefore Z's group developed (and published on) a tool to check for the presence
of a patch in a given binary kernel (!)
This investigation led to further project ideas.

e.g: (paying attention to a news feed) professors twitter as source.


"Back then, it was fashionable to “root” Android phones so that users can customize
the OS and unlock new features that were not possible otherwise. A number of “one-
click root apps” were developed that can automatically root many phone models with
literally a single click of a button. After realizing these apps are basically
launching an attack against the OS kernel, I started to think “how many different
exploits do they have that can target so many phones? Do they have any proprietary
exploits that you can’t find on the Internet? Is it possible for an attacker to
steal these exploits and repurpose them (e.g., to build ransomware)?” It turns out
some of these apps are developed by top hackers employed in the industry and
contain over a hundred exploits, and with some work (of reverse engineering), one
can own them all (yes, pretty scary, huh)."

(!!!) -- concept --> extract 'secrets' of top people and aggregate. (so design
sensibility of Norvig + algorithmic tightness of Carmack's code etc)

Patterns specific to cybersecurity research


1. Adversarial Research:
cybersecurity is inherently adversarial, a reccurring theme is attacks and
defenses. you can attempt to break an existing defense, or build a defense against
an existing attack. Often novel attack papers follow novel defense papers and vice
versa.

2. Automating A Process:
Systems security analysis involves reverse engineering, bug discovery, bug
triage, exploitation (of a vulnerability), checking if a patch has been applied.
All this originally required manual effort. Even partially automating such a
process, applying techniques like program analysis, can have significant value3.

Note: both these can be applied to other domains. (e.g GANs, automated tools)

Tip 3: Develop a good *habit* of *thinkng* about research ideas.

All these tips sound good, but you need to *practice* else it remains empty words.
To practise, *make* a habit of it.

1.(realize that) generation and formulation of an *idea* is *different from*


executing a project.
do not completely focus on your presently executing project and shut down
the outside world.
-read papers periodically e.g when a batch of papers are released by a
conference. (KEY) Isolate the research *ideas* in these papers (not just the
concrete technical details)

2. take paper reviews seriously . identify not only excellent papers, but also
why papers have been rejected.

3. read widely. Many CS fields have matured, so breakthroughs come from


crossover (from other fields) ideas. develop some interests outside your
field/direction within a field.

4. attend group meetings, ask questions.

5. talk to your labmates often (__ actively build a community).


start a casual conversation on a research topic.
debate about a paper.
give genuine feedback.

You might also like