Finding the right needle in the kernel haystack

Noa Resare
6 min readintermediate
--
View Original

Overview

The article discusses Spotify's efforts to automate server provisioning and the challenges faced with RAID hardware configuration, particularly with the sas2ircu tool on the Dell PowerEdge C5220. It details the investigation into kernel driver differences between Debian and Red Hat Enterprise Linux to resolve a segmentation fault issue.

What You'll Learn

1

How to troubleshoot RAID driver issues in Linux environments

2

Why kernel driver differences can affect hardware compatibility

3

How to compile and test modified kernel drivers safely

Prerequisites & Requirements

  • Understanding of Linux kernel and RAID configurations
  • Familiarity with kernel compilation tools and processes(optional)

Key Questions Answered

What caused the segmentation fault with the sas2ircu tool?
The segmentation fault was caused by differences in the mpt2sas driver between the Debian kernel and the Red Hat Enterprise Linux kernel. The Debian version was based on an older upstream release, while the RHEL version included significant updates and backported changes that resolved the issue.
How did Spotify resolve the RAID configuration issue?
Spotify resolved the RAID configuration issue by identifying a specific change in the mpt2sas driver from the RHEL kernel that fixed the segmentation fault. They compiled a modified driver based on this change and successfully tested it in their environment.
What are the risks of modifying RAID controller drivers?
Modifying RAID controller drivers can lead to data corruption if not done carefully. The article warns that changes to the driver can silently corrupt data, emphasizing the need for caution when experimenting with RAID configurations.

Key Statistics & Figures

Lines of code in the mpt2sas driver
20,000 lines
The driver is complex and underwent significant changes between Debian and RHEL versions.
Difference in lines of code between RHEL and Debian mpt2sas drivers
over 10,000 lines
This highlights the extent of modifications and backports in the RHEL version compared to Debian.

Technologies & Tools

Operating System
Debian Gnu/Linux
Used as the primary OS for Spotify's server provisioning.
Operating System
Red Hat Enterprise Linux 6
Served as a comparison point for resolving the RAID driver issue.
Driver
Mpt2sas
The specific driver being modified to resolve RAID configuration issues.

Key Actionable Insights

1
When facing hardware compatibility issues, consider investigating kernel driver differences between distributions.
This approach can uncover critical changes that may resolve issues, as seen with the mpt2sas driver differences between Debian and RHEL.
2
Always back up data before modifying RAID controller drivers to prevent potential data loss.
Given the risks associated with driver modifications, having a reliable backup ensures that data can be restored in case of corruption.
3
Utilize version control for kernel modifications to track changes and facilitate troubleshooting.
This practice allows for easier identification of which changes may have introduced issues, making it simpler to revert to stable versions.

Common Pitfalls

1
Modifying RAID controller drivers without proper testing can lead to data corruption.
This occurs because changes may introduce unforeseen bugs or incompatibilities that affect data integrity.

Related Concepts

Kernel Driver Development
Raid Configuration Best Practices
Linux Operating System Internals