Professional Documents
Culture Documents
Prophecy Io
Prophecy Io
Prophecy Io
Prophecy.io is building a Cloud Native Data Engineering product on Spark and Kubernetes.
We’re building a replacement for ETL products (Informatica, AbInitio, Datastage) that form a
$12B a year market. We’re selling to Fortune 500 Enterprises where we charge $200K -
$2M per year. We have raised $2M in July, and are raising $2M in March and $15M in Sept.
As Enterprises move to the cloud (or to Spark), we replace their existing ETL products. We
also provide automatic transformation of their 10’s of thousands of workflows
• ● Raj product managed Apache Hive at Hortonworks through $1B IPO and was in
the founding team for CUDA compilers at NVIDIA (used for all deep learning on
GPUs). He worked in Microsoft on compiler optimizations, and has developed a
language for insurance contracts.
• ● Rohit product managed Apache Hadoop at Hortonworks and Kafka at Confluent.
• ● Maciej was founder CEO of a startup using ML to summarize research for the
Pharma industry
• ● Vikas was GM financial services for western US - running $100M+ annual sales -
and is a veteran
● We have one team across SF (5) , Gurgaon (4) , Bangalore (5). Most Engineers are from
top
schools (IITs, BITS etc.) and excellent at their jobs. We have the following components:
●●
●●●
We need ●
Visual=Code designer - a unique designer where a user can switch between visual drag
and drop designer and spark code on the fly and edit either. We have a business layer on
top of Spark to express commonly used patterns.
Metadata - we manage datasets, workflows etc. providing a collaborative development
environment. We subsume Hive metastore. For Governance, we provide column level
lineage across workflows.
Execution - we provide the best execution on Apache Spark with extensions to Spark. We
also run interpreters and provide step-by-step debugging on Spark
Deployment - we are writing our own kubernetes operator to deploy our application on-
premise and in public cloud.
Transpilers - we convert legacy products using cross compilers that parse legacy formats
and programming languages and convert them to Spark
Language: 90% of our code is in Scala and functional, we do have some engineers who
have come in with Java or C++ experience and picked up scala. We use Go for kubernetes
operators. Front-end is in React, Redux.
● Area of Strength: We prefer that engineers come in with strength in one area -
databases, spark/scala, compilers, distributed systems
Spark Internals
Work on our core in-memory representation of Spark programs. Add core features to our
product such as data quality library. Add Apache Spark extensions to produce more
informed execution. Connect the code layer to the physical execution layer.
Crawl and convert Hive/Spark git code and convert to our Spark representation. Enhance
our computation of lineage. Build a multi-dimensional lineage serving system.
Spark Execution
Run Spark reliably on different clusters of Spark. Run Spark reliably in interactive mode. Add
high performance parallel loads/stores to database systems. Run edge nodes and related
services reliably. Make sure we’re getting the right statistics out.
Must Have: Top Engineer, be able to figure multiple technologies out. Worked inside
storage or execution systems
We have cross compilers that parse existing ETL abstractions, do static analysis and
optimizations, and convert this to Spark representation. Figure out equivalent performant
abstractions on Spark.
Must Have: Knowledge of ETL and top engineer OR previous work in compiler
analysis/optimizations
Must Have: Knowledge of systems work on cloud (system = not application programming)
Bottom Line