We’re Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too — driving your performance and career growth, charting new paths, and improving the state of the world. If you believe in business as the greatest platform for change and in companies doing well and doing good – you’ve come to the right place.

The Data and Analytics Organization (DnA) is Salesforce's cornerstone for fostering growth and margins through unparalleled data insights. From robust governance to strategic execution, we support data pioneers with an unbiased approach. Our Enterprise Data Strategy builds a solid data foundation, fostering a culture of data-driven decisions. We ensure end-to-end quality through a cohesive data supply chain. By deploying and integration platform tools, we enable seamless data access and automated data management driving efficiency and growth with actionable insights.

Your Impact:

Be responsible for the technical solution design, lead the technical architecture and implementation of data acquisition and integration projects, both batch and real time
Define the overall solution architecture needed to implement a layered data stack that ensures a high level of data quality and timely insights
Communicate with product owners and analysts to clarify requirements
Craft technical solutions and assemble design artifacts (functional design documents, data flow diagrams, data models, etc.)
Build data pipelines data processing tools and technologies in open source and proprietary products
Serve the team as a domain expert & mentor for ETL design, and other related big data and programming technologies
Identify incomplete data, improve quality of data, and integrate data from several data sources
Proactively identify performance & data quality problems and drive the team to remediate them. Advocate architectural and code improvements to the team to improve execution speed and reliability
Design and develop tailored data structures
Reinvent prototypes to create production-ready data flows
Support Data Science research by designing, developing, and maintaining all parts of the Big Data pipeline for reporting, statistical and machine learning, and computational requirements
Perform data profiling, sophisticated sampling, statistical testing, and testing of reliability on data
Clearly articulate pros and cons of various technologies and platforms in open source and proprietary products Implement proof of concept on new technology and tools to help the organization pick the best tools and solutions
Strong SQL optimization and performance tuning experience in a high volume data environment that uses parallel processing
Teams are using the following: SQL, Python, Airflow, AWS, Spark, Tableau, AWS EMR, Snowflake
Participate in the team’s on-call rotation to address sophisticated problems in real-time and keep services operational and highly available

Required Skills:

4 - 12 years experience in data engineering
Build programmatic ETL pipelines with SQL based technologies and platforms
Solid understanding of databases, and working with sophisticated datasets
Data governance, verification and data documentation using current tools and future adopted tools and platform
Work with different technologies (Python, shell scripts) and translate logic into well-performing SQL
Perform tasks such as writing scripts, web scraping, getting data from APIs etc.
Automate data pipelines using scheduling tools like Airflow
Experience with CI/CD technologies and tools like Jenkins, Ant or Gradle, Github
Be prepared for changes in business direction and understand when to adjust designs
Experience writing production level SQL code and good understanding of Data Engineering pipelines
Experience with Hadoop ecosystem and similar frameworks
Previous projects should display technical leadership with an emphasis on data lake, data warehouse solutions, business intelligence, big data analytics, enterprise-scale custom data products
Knowledge of data modeling techniques and high-volume ETL/ELT design
Experience with version control systems (Github, Subversion) and deployment tools (e.g. continuous integration) required
Experience working with Public Cloud platforms like GPC, AWS, or Snowflake
Ability to work effectively in an unstructured and fast-paced environment both independently and in a team setting, with a high degree of self-management with clear communication and commitment to delivery timelines
A related technical degree required

Accommodations

If you require assistance due to a disability applying for open positions please submit a request via this Accommodations Request Form.

Posting Statement

At Salesforce we believe that the business of business is to improve the state of our world. Each of us has a responsibility to drive Equality in our communities and workplaces. We are committed to creating a workforce that reflects society through inclusive programs and initiatives such as equal pay, employee resource groups, inclusive benefits, and more. Learn more about Equality at www.equality.com and explore our company benefits at www.salesforcebenefits.com.

Salesforce is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Salesforce does not accept unsolicited headhunter and agency resumes. Salesforce will not pay any third-party agency or company that does not have a signed agreement with Salesforce.

Salesforce welcomes all.

This job is no longer accepting applications

See open jobs at Salesforce .See open jobs similar to "Lead Data Engineer- Pyspark - Hyderabad" Imagine.

See more open positions at Salesforce

Privacy policy Cookie policy