← Back to AWS practitioner Certification
🌐 AWS practitioner Certification

🔶 Amazon EMR (Elastic MapReduce)

📅 Apr 15, 2026

AWS BIG DATA — MANAGED PROCESSING

🔶 Amazon EMR (Elastic MapReduce)

AWS को Managed Big Data Processing Service — Hadoop, Spark जस्ता tools AWS मा चलाउने सजिलो तरिका।

🔤 STEP 1 — EMR Full Form र अर्थ

EMR

Elastic  MapReduce

Elastic = आवश्यकता अनुसार server थप्न/हटाउन सकिन्छ (auto-scale)

Map = ठूलो data लाई सानो-सानो टुक्रामा बाँड्ने

Reduce = ती टुक्राहरूको result जोड्ने/मिलाउने

🧠 STEP 2 — Big Data भनेको के हो? (यो नबुझी EMR बुझिँदैन)

🇳🇵 Nepal Example — Hamro Patro

Hamro Patro app मा 1 करोड+ users छन्।

प्रत्येक user दिनमा 10 पटक open गर्छ।

→ दिनमा 10 करोड events generate हुन्छ!

यो data analyze गर्न normal DB पुग्दैन → EMR चाहिन्छ!

📏 Data Size compare गर्नुस्

▸ Normal file = MB (Excel, Word)

▸ Normal DB = GB (MySQL, PostgreSQL)

▸ Big Data = TB / PB (1 PB = 10 लाख GB!)

▸ Normal server ले TB/PB process गर्न सक्दैन!

→ धेरै server मिलाएर process गर्नुपर्छ = EMR!

🔧 STEP 3 — Hadoop र Spark भनेको के हो?

Big Data process गर्न tools चाहिन्छ — HadoopSpark ती tools हुन्।

🐘 Hadoop भनेको के हो?

Real example: 1000 किलो आलु छोल्नु छ। एकजनाले छोल्दा 10 घन्टा लाग्छ। 100 जनाले मिलेर छोल्दा 6 मिनेट!

Hadoop ले ठ्याक्कै यही गर्छ — ठूलो data लाई धेरै server मा बाँडेर process गर्छ।

▸ HDFS = Hadoop को आफ्नै storage system
▸ Disk मा data राखेर process गर्छ (slow)

⚡ Spark भनेको के हो?

Real example: Hadoop भनेको पुरानो जमानाको calculator। Spark भनेको नयाँ fast computer।

Spark ले RAM मा data राखेर process गर्छ — Hadoop भन्दा 100x fast!

▸ Real-time processing गर्न सक्छ
▸ ML training मा पनि use हुन्छ

👉 Hadoop/Spark आफैं install र manage गर्न झन्झट → Amazon EMR ले automatically manage गर्छ!

🏗 STEP 4 — EMR Cluster को 3 Nodes — यो 100% Exam मा आउँछ!

Cluster भनेको धेरै server मिलेर एउटा काम गर्ने group। EMR cluster मा 3 types of nodes हुन्छन्।

🏟 Nepal Example — Cricket Team जस्तो: Captain (Master) + खेलाडीहरू (Core) + Substitute (Task)

Node Type काम के हो? Nepal Analogy Exam Key
👑 Master Node Cluster manage र coordinate गर्छ। Jobs assign गर्छ। Cluster को brain। Cricket Captain — कसले कहाँ खेल्ने decide गर्छ 1 मात्र हुन्छ, cluster manage
💾 Core Node Data store + Process दुवै गर्छ। HDFS मा data राख्छ। Main खेलाडीहरू — खेल्छन् पनि, field मा बस्छन् पनि ⭐ Data STORE गर्छ — EXAM MA YAHI AAUXA
⚡ Task Node Process मात्र गर्छ, data store गर्दैन। Optional — हटाउन मिल्छ। Substitute खेलाडी — आवश्यक भएमा मात्र खेलाउँछ Process ONLY, storage छैन, optional

🗺 STEP 5 — Architecture Flow (Nepal Telecom Example)

Scenario: Nepal Telecom ले 1 महिनाको 10TB call log data analyze गर्न चाहन्छ — कुन area मा सबैभन्दा बढी call हुन्छ?

📞

Call Logs

10TB raw data

🗄

S3

Data store

🔶

EMR Cluster

Spark Job runs

📊

Result

S3 / Redshift

✔ 10TB data → S3 → EMR (Spark) → Kathmandu मा सबैभन्दा बढी call भयो भन्ने result → Dashboard!

💾 STEP 6 — EMR + Storage — S3 vs HDFS

Storage के हो? Exam Key
🗄 S3 EMR को Primary storage। Cluster band गर्दा data बाँकी रहन्छ। Persistent। ⭐ EMR मा data → S3 मा राखिन्छ (100% exam)
📀 HDFS Hadoop को आफ्नै storage। Cluster भित्रको temporary storage। Cluster terminate भयो = data हराउँछ! Temporary — Cluster terminate = data gone!

🔥 EMR vs Redshift vs RDS — Confusion Buster

Service के गर्छ? Nepal Example Keyword
🔶 EMR Big data processing — TB/PB data analyze NTC को 10TB call log Spark ले process Hadoop/Spark/TB/PB
🔵 Redshift Data Warehouse — SQL query गरेर business report Daraz को monthly sales report SQL बाट Data warehouse/SQL analytics
🟣 RDS Normal relational DB — daily app transactions Esewa को user account, transaction store Relational/MySQL/PostgreSQL

❓ MCQ Practice — Exam Style

Question Answer
Q1. Company wants to process 10TB log data using Apache Spark without managing clusters.
A) RDS   B) S3   C) EMR   D) EC2 only
C ✔
EMR
Q2. Which node type stores data in EMR cluster?
A) Task Node   B) Core Node   C) Master Node   D) Lambda
B ✔
Core Node
Q3. EMR मा large datasets store गर्न mainly कुन service?
A) DynamoDB   B) S3   C) CloudFront   D) Route53
B ✔
S3
Q4. EMR runs on top of which AWS service?
A) Lambda   B) EC2   C) S3   D) RDS
B ✔
EC2
Q5. ⭐ Tricky — Company wants structured SQL data warehouse. Which service?
A) EMR   B) Redshift   C) EC2   D) S3
B ✔
Redshift (EMR होइन!)

⚡ FINAL EXAM CHEATSHEET — यो keyword = यो answer (100% aauxa)

Keyword देख्यौ Answer किन? One-line
Hadoop / Spark / Big data cluster 🔶 Amazon EMR EMR = Hadoop/Spark managed service
Petabytes / Terabytes processing / Massive log analysis 🔶 Amazon EMR TB/PB = EMR, GB = RDS
Data store गर्ने node / HDFS data 💾 Core Node Core = Store + Process दुवै
EMR cluster terminate भयो, data? S3 मा safe, HDFS gone! S3 = persistent, HDFS = temporary
SQL data warehouse / Business analytics report 🔵 Redshift (EMR होइन!) SQL structured = Redshift, unstructured processing = EMR
Machine learning training / ML on big data 🔶 Amazon EMR Spark मा ML library (MLlib) छ

← Back to AWS practitioner Certification