Building LinkedIn University Pages

Josh Clemm
8 min readintermediate
--
View Original

Overview

The article discusses the engineering behind LinkedIn's University Pages, detailing the architecture, data processing, and technologies used to create a platform that enhances interactions between alumni and students. It highlights how LinkedIn leveraged its unique data and infrastructure to standardize educational information and provide a compelling user experience.

What You'll Learn

1

How to build a data standardization system using Hadoop and Pig scripts

2

Why modeling data as a graph improves querying efficiency

3

How to implement RESTful APIs using the Rest.li framework

4

When to use server-side rendering for SEO optimization

Prerequisites & Requirements

  • Understanding of data modeling and RESTful API concepts
  • Familiarity with Hadoop and related data processing tools(optional)

Key Questions Answered

What technologies were used to build LinkedIn University Pages?
LinkedIn University Pages were built using a combination of technologies including Hadoop for data processing, Espresso DB for storage, Bobo and Zoie for search services, and Rest.li for RESTful APIs. The frontend utilizes dust.js for rendering templates, ensuring a seamless user experience.
How does LinkedIn ensure that University Pages are indexed by search engines?
To ensure University Pages are indexed, LinkedIn uses Fizzy to render dust.js templates server-side, allowing pages to be delivered as HTML without relying on JavaScript. This approach enhances SEO by making content accessible to search engine crawlers.
What is the purpose of the conversation wall in University Pages?
The conversation wall in University Pages allows users to engage with content through social gestures like sharing, commenting, and liking. It is built on a reusable activity feed infrastructure developed in collaboration with LinkedIn's USCP team.
What are the key features of LinkedIn University Pages?
Key features of LinkedIn University Pages include Career Outcomes, Targeted Status Updates, Notable Alumni, and a media Gallery. These features leverage LinkedIn's extensive data to provide valuable insights and enhance user engagement.

Key Statistics & Figures

Number of schools standardized
23,000
LinkedIn's Higher Education team created a standardized list of over 23,000 institutions worldwide.
Number of LinkedIn members
238 million
The data utilized for building University Pages was derived from the career data and network graph of over 238 million LinkedIn members.

Technologies & Tools

Data Processing
Hadoop
Used for data standardization and processing to create insights like Similar schools and Notable Alumni.
Database
Espresso Db
Used to model and store school data as a graph.
Search
Bobo
Built search indices and services on top of the educational data.
Search
Zoie
Used alongside Bobo for search functionalities.
API Framework
Rest.li
Exposed data through RESTful APIs for various LinkedIn services.
Frontend
Dust.js
Used for client-side rendering of templates.
Server-side Rendering
Fizzy
Renders dust.js templates server-side for SEO optimization.

Key Actionable Insights

1
Implementing a graph-based data model can significantly enhance query performance, especially for complex relationships. By modeling educational institutions and their connections, you can simplify data retrieval and improve user experience.
This approach is particularly beneficial in applications where relationships between entities are critical, such as social networks or educational platforms.
2
Utilizing server-side rendering for critical pages can improve SEO and ensure that your content is indexed by search engines. This method allows you to serve fully rendered HTML to crawlers, enhancing visibility.
When launching new features or products, consider implementing server-side rendering to capture search traffic effectively.
3
Leveraging existing frameworks like Rest.li can save development time and provide scalability for your APIs. This framework is designed to handle high query loads efficiently.
When building APIs for high-traffic applications, using established frameworks can reduce the complexity of scaling and maintaining your services.

Common Pitfalls

1
Failing to consider SEO during the development of web applications can lead to poor visibility in search engines.
This often happens when developers rely solely on client-side rendering, which can prevent search engine crawlers from accessing content. Implementing server-side rendering or pre-rendering strategies can mitigate this issue.

Related Concepts

Data Standardization Techniques
Graph Data Modeling
Restful API Design
Seo Best Practices