🔶 Amazon EMR (Elastic MapReduce)
📅 Apr 15, 2026
|
AWS BIG DATA — MANAGED PROCESSING 🔶 Amazon EMR (Elastic MapReduce) AWS को Managed Big Data Processing Service — Hadoop, Spark जस्ता tools AWS मा चलाउने सजिलो तरिका। |
|
🔤 STEP 1 — EMR Full Form र अर्थ
|
|
🧠 STEP 2 — Big Data भनेको के हो? (यो नबुझी EMR बुझिँदैन)
|
|
🔧 STEP 3 — Hadoop र Spark भनेको के हो? Big Data process गर्न tools चाहिन्छ — Hadoop र Spark ती tools हुन्।
|
||||
|
🏗 STEP 4 — EMR Cluster को 3 Nodes — यो 100% Exam मा आउँछ! Cluster भनेको धेरै server मिलेर एउटा काम गर्ने group। EMR cluster मा 3 types of nodes हुन्छन्। 🏟 Nepal Example — Cricket Team जस्तो: Captain (Master) + खेलाडीहरू (Core) + Substitute (Task)
|
|
🗺 STEP 5 — Architecture Flow (Nepal Telecom Example) |
|||||||
|
Scenario: Nepal Telecom ले 1 महिनाको 10TB call log data analyze गर्न चाहन्छ — कुन area मा सबैभन्दा बढी call हुन्छ?
✔ 10TB data → S3 → EMR (Spark) → Kathmandu मा सबैभन्दा बढी call भयो भन्ने result → Dashboard! |
|
💾 STEP 6 — EMR + Storage — S3 vs HDFS
|
|
🔥 EMR vs Redshift vs RDS — Confusion Buster |
|||
| Service | के गर्छ? | Nepal Example | Keyword |
|---|---|---|---|
| 🔶 EMR | Big data processing — TB/PB data analyze | NTC को 10TB call log Spark ले process | Hadoop/Spark/TB/PB |
| 🔵 Redshift | Data Warehouse — SQL query गरेर business report | Daraz को monthly sales report SQL बाट | Data warehouse/SQL analytics |
| 🟣 RDS | Normal relational DB — daily app transactions | Esewa को user account, transaction store | Relational/MySQL/PostgreSQL |
|
❓ MCQ Practice — Exam Style |
|
| Question | Answer |
|---|---|
| Q1. Company wants to process 10TB log data using Apache Spark without managing clusters. A) RDS B) S3 C) EMR D) EC2 only |
C ✔ EMR |
| Q2. Which node type stores data in EMR cluster? A) Task Node B) Core Node C) Master Node D) Lambda |
B ✔ Core Node |
| Q3. EMR मा large datasets store गर्न mainly कुन service? A) DynamoDB B) S3 C) CloudFront D) Route53 |
B ✔ S3 |
| Q4. EMR runs on top of which AWS service? A) Lambda B) EC2 C) S3 D) RDS |
B ✔ EC2 |
| Q5. ⭐ Tricky — Company wants structured SQL data warehouse. Which service? A) EMR B) Redshift C) EC2 D) S3 |
B ✔ Redshift (EMR होइन!) |
|
⚡ FINAL EXAM CHEATSHEET — यो keyword = यो answer (100% aauxa) |
||
| Keyword देख्यौ | Answer | किन? One-line |
|---|---|---|
| Hadoop / Spark / Big data cluster | 🔶 Amazon EMR | EMR = Hadoop/Spark managed service |
| Petabytes / Terabytes processing / Massive log analysis | 🔶 Amazon EMR | TB/PB = EMR, GB = RDS |
| Data store गर्ने node / HDFS data | 💾 Core Node | Core = Store + Process दुवै |
| EMR cluster terminate भयो, data? | S3 मा safe, HDFS gone! | S3 = persistent, HDFS = temporary |
| SQL data warehouse / Business analytics report | 🔵 Redshift (EMR होइन!) | SQL structured = Redshift, unstructured processing = EMR |
| Machine learning training / ML on big data | 🔶 Amazon EMR | Spark मा ML library (MLlib) छ |