Job Code : 202217Bangalore
CANDIDATE PROFILE & CRITICAL EXPOSURE / EXPERIENCE DEEMED ESSENTIAL
Create and maintain optimal data ETL architecture
Assemble large, complex data sets that meet functional / non-functional business requirements.
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS Cloud based serverless technologies.
JOB - ROLE & RESPONSIBILITIES
Primary responsibility is the development of ETL processes:
- Technologies required: SSIS, MS SQL Server, Python, PL\SQL
- SSIS Advanced component experience is required (transformation objects in data flow).
- 2 years of advanced SSIS experience.
- Python for data cleansing and file processing
- Plan, coordinate, develop and support ETL processes including architecting table structure, building ETL process, documentation, and long-term preparedness.
- Develop cross validation rules to ensure mapping accuracy.
- Provide support of ETL processes created.
- Document steps needed to migrate database objects between various environments including Development, Test and Production.
- Communicate issues, risks, and concerns proactively to management.
- Document ETL process thoroughly to allow peers to assist with support as needed.
- Self-sufficient and strong communication skills (must be able to work with external groups/departments).
- Data Warehousing experience is preferred.
3+ years of experience in IT designing, developing and maintaining Bigdata Technologies like Hadoop , Spark ecosystem with programming languages Python/Java/Scala.
Candidate should have working knowledge on AWS Bigdata EcoSystem services like (EMR,S3,Redshift,DynamoDB etc).
Experienced with IT operations and support of Data intensive solutions in a complex global environment.
Work experience on Spark using Python/Java/Scala on cluster for computational (analytics), installed on top of Hadoop for advanced analytics capability and use of Spark with Hive and SQL/Oracle.
Working with any distributions of Hadoop enterprise versions of Cloudera , Hortonworks and good knowledge on MAPR distribution.
Good working experience in importing data from various backend database applications and perform transformations using Hive, Pig and Spark.
Experience on data warehouse systems and ETL tools like Kafka/Informatica/Snaplogic or Talend.
At least one year programming experience. Proficiency with any programming languages like Python/Java/Scala.
SQL Server Developer skills including writing stored procedures, triggers, views, query writing.
Candidate have good working know on Scripting languages like shell script/PowerShell Script.
Big Data Processing on any Cloud Platform(AWS/AZURE/GCP).
Any one programming languages Python/JAVA/Scala.
EMR(Spark),Redshift,DynamoDB,S3,SQL,Kafka,shell script/PowerShell Script.