Job Details



Refer Job: Send to a Friend
  • Share this on Facebook
  • Share this on LinkedIn

Add Add to Saved Jobs

Back

Information Technology - Data Sciences & Analytics Engineer (Data Engineering Track)

Job Description

We have multiple junior and senior data engineer positions available in the Data Engineering team. The data engineer is a software developer with strong software engineering skills who is responsible for building custom open-source-based data ingestion and MLOps platforms. He/she has deep appreciation of the complexity of the data engineering process, such as the challenges of data ingestion involving large or near-real-time datasets, the maintenance of high data quality, and the importance of automation for increasing pipeline robustness and reducing the need for human intervention.

Responsibilities:

  • Be an effective distributed-system implementer in the following core activities:
    • Design and develop data engineering services and their ecosystem using distributed databases (relational, columnar, graph, in-memory); orchestration (Apache Airflow); and distributed stream/batch data processing (Kafka, Kinesis, Spark).
    • Design and develop MLOps production pipelines; provide technical support to data scientists/ML engineers by getting their ML/DL models deployed at scale and meeting SLAs on both cloud and on-premises GPU and CPU instances.
    • Design data models for mission-critical, high-volume, near-real-time/batch data; build idempotent/atomic production data pipelines to make data ingestion more fault tolerant.
    • Design and develop intuitive, highly automated, self-service data platform functions for business users.
  • Explore, evaluate and champion the introduction of next-generation technologies in the data-ingestion workflow. Participate in project planning and provide technical guidance on cloud architecture for data projects.
  • Any other ad-hoc duties.
  • This is an individual contributor role.

Requirements

  • BS in Computer Science or other related discipline is required. Advanced degrees in Computer Science (PhD, MS) are highly desirable.
  • 3+ years of relevant industry experience in some or most of the following technical areas:
    • Advanced programming skills in Python. Conversant with data structures and algorithm design.
    • Experience in building data pipelines (including data collection, warehousing, processing, analysis, monitoring, and governance) using open-source data ingestion platforms.
    • Intermediate-level knowledge and experience with AWS cloud components and best practices. Good understanding in deploying data stores such as S3, RedShift, Elasticache and PostgreSQL.
    • Prior experience in modern software development is required (such as web frontend UI, backend API microservices, understanding of CI/CD and Scrum/Kanban agile development). Strong grasp on object-oriented or functional programming (using Python, Java, Scala, or C#).