Engineering education enables data scientists to better interface with engineering and ensures higher data quality.
Overview
The article discusses the importance of engineering education for data scientists at Airbnb, emphasizing how it enhances their ability to interface with engineering teams and improves data quality. It outlines the Engineering Empowered Data Science (EEDS) program designed to equip data scientists with essential engineering knowledge.
What You'll Learn
1
How to improve data quality through better logging practices
2
Why understanding the data system is crucial for effective collaboration with engineers
3
How to leverage modern logging infrastructure using SQL queries
Prerequisites & Requirements
- Basic understanding of data science concepts
- Familiarity with data analysis tools and practices(optional)
Key Questions Answered
How does the EEDS program enhance data scientists' skills?
The EEDS program enhances data scientists' skills by providing them with essential engineering knowledge specific to data system design, data quality improvement, and productivity. This training helps them understand the entire data system, enabling better collaboration with engineers and improving the quality of their work.
What challenges does the EEDS program address?
The EEDS program addresses several challenges, including compartmentalized knowledge of the Airbnb data platform, insufficient understanding of upstream logging issues, and limited documentation for internal tools. By tackling these issues, the program aims to improve data processing efficiency and collaboration between data scientists and engineers.
What are the key learning objectives of the EEDS program?
The key learning objectives of the EEDS program include empowering data scientists with a deeper understanding of the data system, equipping them to leverage modern logging infrastructure, and disseminating best practices in automation and machine learning. These objectives aim to enhance their productivity and code quality.
What feedback has been received about the EEDS program?
Feedback from participants indicates that over 90% found the EEDS program to be a highly impactful use of their time, having learned something new and helpful for their day-to-day work. This positive reception highlights the program's effectiveness in enhancing data science skills.
Key Statistics & Figures
Number of courses taught
over 400
These courses have been delivered to thousands of participants by 55 volunteer faculty members.
Percentage of students who found the training impactful
over 90%
This statistic reflects the positive feedback received from participants of the EEDS program.
Number of data scientists who participated in the training
over 50
This number illustrates the reach and engagement of the EEDS program within the Airbnb data science team.
Technologies & Tools
Tools
Airpy
A Python toolkit for accessing, extracting, manipulating, and plotting data from Airbnb data sources.
Tools
Rbnb
A collection of R functions and R packages essential for practicing data science at Airbnb.
Key Actionable Insights
1Implementing best practices in logging can significantly improve data quality and reduce downstream issues.By ensuring that data is logged correctly from the start, data scientists can minimize the need for extensive troubleshooting later, leading to more reliable analyses and models.
2Encouraging data scientists to understand the entire data system fosters better collaboration with engineering teams.When data scientists are well-versed in the engineering aspects of data systems, they can communicate more effectively with engineers, leading to improved project outcomes and innovation.
3Offering team-specific training can address unique challenges faced by different business units.Tailoring educational programs to the specific needs of various teams ensures that the training is relevant and directly applicable to their work, enhancing overall productivity.
Common Pitfalls
1
Data scientists often lack sufficient knowledge of the engineering systems, leading to issues in logging and data processing.
This gap in knowledge can result in inefficient data handling and analysis, making it crucial for data scientists to receive training that bridges this divide.
2
Limited documentation and tutorials for internal tools can hinder effective use.
Without comprehensive resources, data scientists may struggle to utilize tools effectively, leading to wasted time and suboptimal results.
Related Concepts
Data Quality Improvement Strategies
Collaboration Between Data Science And Engineering Teams
Best Practices In Data Logging And Experimentation