Data Engineer
About Candidate
- Expertise in data engineering, building and managing large-scale ETL pipelines using Apache Spark, Hive, and AWS tools.
- Strong experience in managing big data infrastructure, including Cloudera and AWS S3, with a focus on scalability and performance optimization.
- Proficient in building data lakes, setting up raw and transformed data layers, and developing data aggregation strategies.
- Hands-on experience with data orchestration using Apache Airflow and Redshift as a data warehouse solution.
- Skilled in implementing and maintaining data quality frameworks, leveraging tools like “Great Expectations” and “Deequ” for data validation.
- Experience in the design and deployment of complex, enterprise-level data pipelines for telecom and healthcare industries.
- Adept at data migration and system upgrades, such as moving from HDP 2.7 to HDP 3.0.
- Knowledgeable in integrating data from multiple sources and vendors in various formats (CSV, Parquet, XLSX).
- Proficient in utilizing Apache Kafka, Ignite, and Cassandra for data streaming and marketing campaign triggers.
- Experience in building and maintaining documentation for data processes and deployment workflows.
- Skilled in handling massive datasets, with expertise in loading and managing operational workloads in Hadoop and Teradata environments.
- Competence in user management and access control using tools like Kerberos and Apache Ranger.
- Expertise in SQL and query optimization for large-scale data analysis and reporting needs.
- Proficient in leveraging cloud-based tools (AWS Lambda, S3, Redshift) to automate data workflows and reporting.
- Solid understanding of data security and privacy practices, including data encryption and compliance with industry standards.
- Experience in creating actionable insights from network and performance data, using automation to streamline reporting and analysis.
- Hands-on experience with BI tools like Apache Superset, Looker, and IBM Cognos for internal and customer-facing analytics.
- Proven track record in leading data engineering projects from scratch and successfully implementing data solutions in real-time environments.