SRE Job Description

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Site Reliability

Engineering
What does the SRE team do?
SRE is a key part of a closely coupled autonomous engineering team that work together to solve the business
problems.There are multiple SRE teams that catering to our business in the Ad-tech space. A few of our Ad-tech
products include domain monetization, contextual monetization, Ad quality , Ad exchange, Programmatic
Advertising, etc. One of the teams is responsible for maintaining the brains behind our contextual advertising
systems. Similarly, another team works on building a real time Ad Quality system which provides various real time
bidding metrics and real-time ad-display decisions.

Each SRE team gets to work on breadth of applications, from low latency high throughput web serving to large
scale data systems

The SRE team has carefully crafted an internal platform that allows controlled, but speedy innovation while
crunching a quarter billion messages a day, and leveraging multiple state-of-the-art algorithms, to serve the best
relevant ads on some of the leading news and information websites on the planet.

To support this platform, our infra strives to be an industry standard: Maintaining high availability and reliability,
while allowing our distributed applications to serve tens of terabytes of information every hour. We make use of
open-source technologies on commodity hardware and scale them beyond the scope and scale of enterprise
solutions, hosted in public cloud and co-located datacentres.
Role & Responsibilities
Your team is focused on improving and promoting the availability, stability and performance of our
infrastructure, systems and applications. A SRE’s responsibilities will include:
• Shaping the scope and expertise for SRE practices across the team.
• Building reliability and resiliency into our infrastructure, tools, services and processes working with
our development team, plus establishing practices for supporting, and running them that allow us to
keep services highly available to our clients, easily supportable by our developers, and operable for
the company.
• Driving design, implementation, and support of large-scale infrastructure. You and your team will
participate in the design and implementation phases for new and existing products
• Developing policies and procedures that improve overall platform stability and participate in
shared on-call schedule
Who should apply for this role?
• B.Tech/M.Tech or Equivalent in Computer Science, • Data intensive applications and platforms like
Information Technology, or a related field Kafka, Hadoop, Spark, Zookeeper, Cassandra,
• 1-4 years of experience in handling services in a large- PostgreSQL OLAP, Druid
scale distributed system. • Relational databases like MySQL, Oracle,
• Deep understanding of network stack (e.g., TCP/IP, PostgreSQL etc
routing, network topologies and hardware, SDN, etc.) • NoSQL databases like Redis, MongoDB, Cassandra,
• Deep understanding of modern software architectures, CouchDB etc
including load-balancing, queueing, caching, distributed • One or more CI tools like Jenkins, Teamcity
systems failure modes generally, microservices and big • Centralized logging systems, metrics, and tooling
data technologies. frameworks such as ELK, Prometheus, and Grafana.
• Excellent programming (Python, Go, Ruby or preferred • Web and Application servers like Apache, Nginx,
scripting languages) and automation skills Tomcat
• You have expertise in one or more of the below tools/skills • Versioning tools such as git.
• Container orchestration technologies like Kubernetes • Ability to work independently and own problem
and Mesos statements end-to-end.
• Virtualization platforms, either on-prem or cloud- • Great communication, interpersonal and teamwork
based (We use Openstack and AWS) skills.
• Understands Infrastructure as a code (we use Puppet, • Adaptable to work in a fast-paced environment and
Ansible and Terraform) and containerization tool sets alter priorities as per business needs
(we use Docker).

You might also like