Definition: Creating a plan for how different parts of software will work together, like a blueprint of building.

![[System Design Scorecard.jpeg]]

System design topics
![[System Design Topics.jpg]]
### Key documents
[[System-design-interview by Alex Xu.pdf]]
[[ByteByteGo-System-Design-2023.pdf]]
[[ByteByteGo_System_Design_Archive_2023.pdf]]

[Important white papers - Gaurav Sen](https://interviewready.io/blog/white-papers-worth-reading-for-software-engineers)
Documents from **[Educative.io](http://Educative.io) - System Design course**
Check iCloud Vault>Career>System Design Course folder
1. [[SD-CAP Theorem.pdf]]
2. [[SD-SQL.pdf]]
3. [[SD-Redundancy.pdf]]
4. [[SD-Indexes.pdf]]
5. [[SD-Data Partitioning and Sharding.pdf]]
6. [[SD-Caching.pdf]]
7. [[SD-Load Balancing.pdf]]
8. [[SD-Characteristics.pdf]]
9. [[SD-Ticketmaster.pdf]]
10. [[SD-Yelp.pdf]]
11. [[SD-Facebook Newsfeed.pdf]]
12. [[SD-Web Crawler.pdf]]
13. [[SD-Uber.pdf]]
14. [[SD-Uber Backend.pdf]]
15. [[SD-Twitter Search.pdf]]
16. [[SD-API Rate Limiter.pdf]]
17. [[SD-Typehead.pdf]]
18. [[SD-YouTube Netflix.pdf]]
19. [[SD-Twitter.pdf]]
20. [[SD-Messenger.pdf]]
21. [[SD-Dropbox.pdf]]
22. [[SD-Instagram.pdf]]
23. [[SD ML - Feature Selection and Feature Engineering - Machine Learning System Design.pdf]]
24. [[SD-ML Training Pipeline - Machine Learning System Design.pdf]]

Another paid course option: [What do we offer? | How do I use this course? | System Design Simplified | InterviewReady](https://interviewready.io/learn/system-design-course/how-do-i-use-this-course/what-do-we-offer)


### Approach in interviews: 
[System Design Interview in 2023 (educative.io)](https://www.educative.io/blog/complete-guide-system-design-interview)

- Understand problem scope, product requirements, constraints and assumptions 
- Clarify the scale to build system for by identifying user attributes, 
	- how many users active? 
	- talk about user actions, 
	- then talk about input / output data to be exchanged, 
- how much data expected to be handled (may need rough calculation) 
- Technical and design tradeoffs to be discussed
	- Network, storage aspects
	- Understand the expected read to write ratio 
	- Talk about basic data structure
- Scalability and performance
	- Distributed architecture
	- Performance tradeoffs
- Reliability and fault tolerance
	- Single point of failure identified in both external and internal systems
- Explain what API endpoints to be designed as an example 


### Key concepts
[ByteByteGo | Technical Interview Prep](https://bytebytego.com/)
[System Design in 2023 (educative.io)](https://www.educative.io/blog/complete-guide-to-system-design)
##### Principles of System Design
1. Modularization: Dividing the system into smaller, manageable modules help reduce complexity, improve maintainability, and increase reusability.
2. Abstraction: Hiding the implementation details and showing only the essential features helps simplify complex systems and promote modularity.
3. Layering: Organizing the system into layers, each layer providing a specific set of functionalities promotes the separation of concerns and enhances maintainability.
4. Scalability: Design systems to handle the increased load by adding more resources (horizontal scaling) or optimizing the system’s capacity (vertical scaling).
5. Performance: Optimizing the system’s response time, throughput, and resource utilization is crucial for a successful design.
6. Security: Ensure the system’s confidentiality, integrity, and availability by implementing proper security measures and practices.
7. Fault Tolerance and Resilience: Design systems to withstand failures and recover gracefully from errors, ensuring reliability and availability.

##### Industry examples
[[High level architecture.png]]
[[Slack - notification algorithm.jpeg]]
25 real-world architectures to crush system design:
###### Big tech backend systems
0. How Uber Drives 40M Reads/Sec with Integrated Cache: https://lnkd.in/dfR3wxQm
1. Why Netflix Integrated a Service Mesh in Their Backend: https://lnkd.in/d3tMQ6Vu
2. Scaling Stripe APIs with Rate Limiting: https://lnkd.in/dmeSZkK6
3. How Uber Computes ETA at Scale: https://lnkd.in/dH4nHqG4
4. How Zoom Supports 300 Million Video Calls/Day: https://lnkd.in/d7i8jmiU
5. How Meta Built Threads: https://lnkd.in/dUcqHx9A
6. How Pinterest Scaled to 11 Million Users with Only 6 Engineers: https://lnkd.in/dsXbW8YV
6. How Quora Scaled MySQL to 100k+ Queries per Second: https://lnkd.in/dhMEeXdS
7. How Canva Supports 135 Million Monthly Simultaneous Users: https://lnkd.in/duSPasJX
8. How LinkedIn Scales to 5 Million Profiles Reads / s: https://lnkd.in/dfiFkt_z
9. How Uber Finds Nearby Drivers at 1M Requests per Second: https://lnkd.in/dvnsUbbk
10. How Instagram Scaled to 14M Users with Only 3 Engineers: https://lnkd.in/dbfHWn9Z
11. Prime Video: Amazon's Secret to Streaming Video at Scale: https://lnkd.in/dm49aVhP

###### Frontend engineering
13. Re-Architecturing Airbnb's Frontend: https://lnkd.in/dh9JxbJE
14. Making Instagram Faster (3 parts series): https://lnkd.in/dB7HV6aR
15. The Best App to Slice Through Front-end Interviews: https://lnkd.in/dV7PHkVc
16. How to Design Facebook's News Feed: https://lnkd.in/dGRG3Mp2
17. How YouTube Improved Video Performance: https://lnkd.in/d2ezaACw
18. Shopping for Speed on eBay: https://lnkd.in/d46w6xfa
19. How to Design an Autocomplete System: https://lnkd.in/d2tS83he
20. How Twitter Used Redux: https://lnkd.in/d7Uj33Yq

###### Machine learning/AI
21. How LinkedIn Detects Spam Content: https://lnkd.in/dqHqjmE6
22. How Spotify Generates Ad Content at Scale: https://lnkd.in/dCk5Wn27
23. How OpenAI Trained ChatGPT: https://lnkd.in/d4kFupaP
24. How Airbnb Discovers What Users Like: https://lnkd.in/d8-upnAv
25. How Microsoft diagnoses prod issues with LLMs: https://lnkd.in/d5wk6yXa

##### Communication protocols
- TCP - Transfer Control Protocol
- HTTP - HyperText Transfer Protocol
- gRPC - gRPC Remote Procedure Call
	- [Introduction to gRPC | gRPC](https://grpc.io/docs/what-is-grpc/introduction/)
	- [FAQ | gRPC](https://grpc.io/docs/what-is-grpc/faq/)
	- The main usage scenarios:
		- Low latency, highly scalable, distributed systems.
		- Developing mobile clients which are communicating to a cloud server.
		- Designing a new protocol that needs to be accurate, efficient and language independent.
		- Layered design to enable extension eg. authentication, load balancing, logging and monitoring etc.
- WebSockets
##### Server
###### Horizontal vs vertical scaling
Both are scaling techniques used to solve problems with increasing business requirements
- Horizontal: Add more servers of the same type to handle increased workload
- Vertical: Add more capability to the same server to handle increased workload
Real world application - is always a combination
- Create a resilient, scalable solution with consistent data and fast inter-process communication.
Distributed systems:
- system whose components are located on different networked systems/computers - components interact with each other to achieve a common goal

| |Horizontal|Vertical|
|---|---|---|
|Load balancing|required|n/a|
|Fault tolerant|Resilient|Has single point of failure|
|Communication|Over network calls since multiple servers are involved|Inter process communication required|
|Data management|Complex and needs understanding|Consistent since data is managed within a single server/system|
|Scalability|Can scale as users are added. Extensible.|Hardware has a limit up to which you can upgrade|

##### Database
Content delivery network: [[CDN]]
**SQL vs noSQL databases**

##### Cache
- When to use cache: Consider using cache when data is read frequently but modified infrequently
- Cache policy
	- Expiration: Once cached, data is expired, it is removed from the cache. Decide if expiration should happen too soon or too late depending on use case. 
	- What to cache: Decide depending on usage of cached data
	- Eviction policy: which data will be removed from cache when cache is full? it depends on the use case 
		- Least-recently-used (LRU) is the most popular cache eviction policy. 
		- Least Frequently Used (LF Frequently Used (LFU)
		- First in First Out (FIFO)
- Consistency: Cache data needs to stay consistent with main database
	- Inconsistency can happen because data-modifying operations on the data store and cache are not in a single transaction. When scaling across multiple regions, maintaining consistency between the data store and cache is challenging.
- Availability: 
	- Implement multiple cache servers to avoid single point of failure (SPOF)
	- 

##### Load balancer
- 

##### Site speed
Speed at which browser is able to fully load functional elements of a webpage. 
Site speed leads to higher revenue, better user experience, higher search engine ranking, higher conversion and lower dropoff rates
1. Assess on Google’s pagespeed insights tool
2. Compress images
3. Use CDN - content delivery network
4. Set up edge services - geo partition services
5. Use cache for frequently used resources (cachable redirect so resources are pulled in faster in parallel calls instead of one giant HTML code)
6. Sequence the most important things to load first based on what matters to users - images, product videos come first. Chat bot come later.
7. Use ‘load when clicked’ type of implementation, so that widgets, 3rd party add-ons will load only when interacted with
8. Avoid pop ups, use only when necessary
9. Lite embed the video so that only YouTube video thumbnail is loaded first. The entire video is streamed when user clicks on it.

### Examples
[[Tiktok system design]]
[[Dictation service on iOS]]
[Gaurav Sen - Search Systems](https://www.linkedin.com/posts/gkcs_systemdesign-searchengines-activity-7213130792846163969-rkh1)