Prophecy Io

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Company Summary

Prophecy.io is building a Cloud Native Data Engineering product on Spark and Kubernetes.
We’re building a replacement for ETL products (Informatica, AbInitio, Datastage) that form a
$12B a year market. We’re selling to Fortune 500 Enterprises where we charge $200K -
$2M per year. We have raised $2M in July, and are raising $2M in March and $15M in Sept.

As Enterprises move to the cloud (or to Spark), we replace their existing ETL products. We
also provide automatic transformation of their 10’s of thousands of workflows

We have strong founding team

• ● Raj product managed Apache Hive at Hortonworks through $1B IPO and was in
the founding team for CUDA compilers at NVIDIA (used for all deep learning on
GPUs). He worked in Microsoft on compiler optimizations, and has developed a
language for insurance contracts.
• ● Rohit product managed Apache Hadoop at Hortonworks and Kafka at Confluent.
• ● Maciej was founder CEO of a startup using ML to summarize research for the
Pharma industry

with top 3 players as customers. Top engineer.

• ● Vikas was GM financial services for western US - running $100M+ annual sales -
and is a veteran

of enterprise sales and SI ecosystems.

We have a strong distributed team -

● We have one team across SF (5) , Gurgaon (4) , Bangalore (5). Most Engineers are from
top

schools (IITs, BITS etc.) and excellent at their jobs. We have the following components:

●●

●●●

We need ●
Visual=Code designer - a unique designer where a user can switch between visual drag
and drop designer and spark code on the fly and edit either. We have a business layer on
top of Spark to express commonly used patterns.
Metadata - we manage datasets, workflows etc. providing a collaborative development
environment. We subsume Hive metastore. For Governance, we provide column level
lineage across workflows.
Execution - we provide the best execution on Apache Spark with extensions to Spark. We
also run interpreters and provide step-by-step debugging on Spark
Deployment - we are writing our own kubernetes operator to deploy our application on-
premise and in public cloud.
Transpilers - we convert legacy products using cross compilers that parse legacy formats
and programming languages and convert them to Spark

engineers with following skills:

Language: 90% of our code is in Scala and functional, we do have some engineers who
have come in with Java or C++ experience and picked up scala. We use Go for kubernetes
operators. Front-end is in React, Redux.

● Area of Strength: We prefer that engineers come in with strength in one area -
databases, spark/scala, compilers, distributed systems

Database Architect / Engineer

We need Architects to spend 60% of the time coding

Spark Internals

Work on our core in-memory representation of Spark programs. Add core features to our
product such as data quality library. Add Apache Spark extensions to produce more
informed execution. Connect the code layer to the physical execution layer.

Must Have: Previous experience of database internals. Spark/Hive Lineage

Crawl and convert Hive/Spark git code and convert to our Spark representation. Enhance
our computation of lineage. Build a multi-dimensional lineage serving system.

Must Have: Previous experience of query parsers or experience on cube structures

Execution Architect / Engineer

We need Architects to spend 60% of the time coding

Spark Execution

Run Spark reliably on different clusters of Spark. Run Spark reliably in interactive mode. Add
high performance parallel loads/stores to database systems. Run edge nodes and related
services reliably. Make sure we’re getting the right statistics out.

Must Have: Top Engineer, be able to figure multiple technologies out. Worked inside
storage or execution systems

Scala / Compiler Engineers


ETL to Spark Conversion

We have cross compilers that parse existing ETL abstractions, do static analysis and
optimizations, and convert this to Spark representation. Figure out equivalent performant
abstractions on Spark.

Must Have: Knowledge of ETL and top engineer OR previous work in compiler
analysis/optimizations

Cloud Execution Engineers


Spark Execution
Run our code reliably in various cloud environments. Add abstractions for various cloud
vendors.

Must Have: Knowledge of systems work on cloud (system = not application programming)

Spark Execution: Cloudera, Qubole, Streamsets, StreamAnalytix

Bottom Line

If engineers know Spark or Scala, we want to talk


If engineers have built database or ETL products (core execution), we want to talk If
engineers have worked deep in systems - Ceph etc. we want to talk

You might also like