Role description
Job Summary As a Product Engineer - Big Data, you will design, build, and optimize large-scale data processing pipelines using modern Big Data technologies. You will collaborate with data scientists, analysts, and product managers to ensure data accessibility, security, and reliability. Your work will focus on delivering scalable, high-quality data solutions while driving continuous improvements across the data lifecycle. Key Responsibilities
1. ETL Pipeline Development & Optimization
Design and implement complex, end-to-end ETL pipelines for large-scale data ingestion and processing.
Optimize performance, scalability, and resilience of data pipelines.
2. Big Data Processing
Develop and optimize real-time and batch data workflows using Apache Spark, Scala/PySpark, and Apache Kafka.
Ensure fault-tolerant, high-performance data processing.
Knowledge of Java and NoSQL is a plus.
3. Cloud Infrastructure Development
Build scalable, cost-efficient cloud-based data infrastructure leveraging AWS services.
Ensure pipelines are resilient to variations in data volume, velocity, and variety.
4. Data Analysis & Insights
Work with business teams and data scientists to deliver high-quality datasets aligned with business needs.
Perform data analysis to uncover trends, anomalies, and actionable insights.
Present findings clearly to technical and non-technical stakeholders.
5. Real-time & Batch Data Integration
6. CI/CD & Automation
Use Jenkins (or similar tools) to implement CI/CD pipelines.
Automate testing, deployment, and monitoring of data solutions.
7. Data Security & Compliance
Ensure pipelines comply with relevant data governance and regulatory frameworks (e.g., GDPR, HIPAA).
Implement controls for data integrity, security, and traceability.
8. Collaboration & Cross-Functional Work
Partner with engineers, product managers, and data teams in an Agile environment.
Contribute to sprint planning, architectural discussions, and solution design.
9. Troubleshooting & Performance Tuning
Identify and resolve bottlenecks in data pipelines.
Conduct performance tuning and adopt best practices for ingestion, processing, and storage.
Required Experience
4-8 years (or adjust as needed) of hands-on experience in Big Data engineering, cloud data platforms, and large-scale data processing.
Proven experience delivering scalable data solutions in production environments.
Mandatory Skills
AWS Expertise Hands-on experience with EMR, Managed Apache Airflow, Glue, S3, DMS, MSK, EC2, and cloud-native data architectures.
Big Data Technologies Proficiency in PySpark/Scala Spark and SQL. Experience with Apache Spark, Kafka, and large-scale data processing.
Data Frameworks Strong knowledge of Spark DataFrames and Datasets.
Database Modeling & Data Warehousing Experience designing scalable OLAP/OLTP data models and warehouse solutions.
ETL Pipeline Development Proven ability to build robust real-time & batch pipelines across various platforms.
Data Analysis & Insights Strong analytical skills with the ability to extract meaningful insights and support business decisions.
CI/CD & Automation Practical experience with Jenkins or similar tools for automating deployment and monitoring.
Good-to-Have Skills
Familiarity with data governance frameworks and compliance standards.
Experience with monitoring tools such as AWS CloudWatch, Splunk, or Dynatrace.
Working knowledge of Java or NoSQL databases.
Exposure to cost optimization strategies in cloud environments.
Skills
Apache Spark,Scala experience,Aws,Big Data
|