uni_bi_multi variate analysis
📅 May 08, 2026
Jupyter Notebook
📊 Univariate, Bivariate र Multivariate Analysis¶
किन चाहिन्छ यो?¶
Data analysis गर्दा सबैभन्दा पहिले data कस्तो छ भनेर बुझ्नु पर्छ।
कतिवटा variable एकैसाथ हेर्दैछौं भन्ने अनुसार analysis को नाम फरक हुन्छ।
१ वटा variable → Univariate Analysis
२ वटा variable → Bivariate Analysis
३+ वटा variable → Multivariate Analysis
1️⃣ Univariate Analysis¶
परिभाषा¶
एउटा मात्र variable/column लाई हेरेर त्यसको distribution, pattern र summary निकाल्नु।
अरू कुनै column सँग compare गरिँदैन — एक्लै अध्ययन गरिन्छ।
उदाहरण¶
विद्यार्थीहरूको उचाइ (cm):
[150, 155, 160, 160, 165, 170, 170, 175, 180]
हेर्ने कुराहरू:
├── Mean → 165 cm (औसत)
├── Median → 165 cm (बीचको मान)
├── Mode → 160, 170 (सबैभन्दा धेरै आउने)
├── Min/Max → 150 / 180
└── Spread → कति फैलिएको छ data?
के-के हेर्छौं?¶
| Measure | अर्थ | Example |
|---|---|---|
| Mean | सबैको औसत | (150+180+...) / 9 |
| Median | बीचको मान | sorted गरेपछि बीचको |
| Mode | सबैभन्दा धेरै आउने | 160 र 170 |
| Variance | data कति फैलिएको छ | ठूलो = बढी scatter |
| Std Dev | औसतबाट कति टाढा | variance को square root |
| Skewness | data कुन तर्फ झुकेको | left या right |
Tools र Charts¶
Histogram → data को distribution हेर्न
Box Plot → outlier र spread हेर्न
Bar Chart → category count हेर्न
Pie Chart → percentage हेर्न
Code¶
import pandas as pd
import matplotlib.pyplot as plt
df['उचाइ'].describe() # mean, std, min, max सबै एकैसाथ
df['उचाइ'].hist() # histogram
df['उचाइ'].plot(kind='box') # box plot
Use Case¶
- Data मा कुनै outlier छ कि छैन हेर्न
- Data normal distribution follow गर्छ कि गर्दैन जाँच्न
- Missing values को अवस्था बुझ्न
- Data को overall shape बुझ्न
2️⃣ Bivariate Analysis¶
परिभाषा¶
दुईवटा variable को बीचको सम्बन्ध (relationship) हेर्नु।
एउटा बढ्दा अर्को के हुन्छ — त्यो पत्ता लगाउनु।
उदाहरण¶
उचाइ (cm) → तौल (kg)
─────────────────────────
150 → 50
160 → 60 ← उचाइ बढ्दा तौल पनि बढ्यो
170 → 70
180 → 80
यहाँ उचाइ र तौल बीच सम्बन्ध छ — यही Bivariate Analysis हो।
सम्बन्धका प्रकार¶
Positive Correlation → एउटा बढ्दा अर्को पनि बढ्छ
उचाइ ↑ तौल ↑
Negative Correlation → एउटा बढ्दा अर्को घट्छ
Exercise ↑ Weight ↓
No Correlation → कुनै सम्बन्ध छैन
जुत्ताको साइज ↑ IQ — कुनै सम्बन्ध छैन
Correlation Value को अर्थ¶
| Value | अर्थ |
|---|---|
+1.0 |
Perfect positive — एकदमै strong सम्बन्ध |
+0.7 |
Strong positive |
+0.3 |
Weak positive |
0.0 |
कुनै सम्बन्ध छैन |
-0.3 |
Weak negative |
-0.7 |
Strong negative |
-1.0 |
Perfect negative |
Variable को प्रकार अनुसार Analysis¶
| Variable 1 | Variable 2 | Chart |
|---|---|---|
| Number | Number | Scatter Plot, Correlation |
| Category | Number | Box Plot, Bar Chart |
| Category | Category | Cross Tab, Grouped Bar |
Code¶
# Correlation निकाल्न
df[['उचाइ', 'तौल']].corr()
# Scatter Plot
df.plot(x='उचाइ', y='तौल', kind='scatter')
# Box Plot (category vs number)
df.boxplot(column='marks', by='gender')
Use Case¶
- Features र target बीच सम्बन्ध छ कि छैन हेर्न
- Multicollinearity पत्ता लगाउन (दुई feature एकै कुरा बताउँछन् कि?)
- कुन feature model मा राख्ने भनेर निर्णय गर्न
- Sales र Advertisement बीच कति असर छ हेर्न
3️⃣ Multivariate Analysis¶
परिभाषा¶
तीनवटा वा सोभन्दा बढी variables को बीचको सम्बन्ध एकैसाथ हेर्नु।
कुन feature ले कुन feature मा कति असर गर्छ — सबै एकैपटक बुझ्नु।
उदाहरण¶
उचाइ + तौल + उमेर + exercise_hours + marks
↓
प्रश्न: यी सबैको बीच कस्तो सम्बन्ध छ?
कुन feature ले marks मा सबैभन्दा बढी असर गर्छ?
कुन दुई feature एकै कुरा बताउँछन्?
Tools र Charts¶
Heatmap → सबै features को correlation एकैसाथ देखाउँछ
Pair Plot → सबै features को scatter plot एकैसाथ
PCA → धेरै features लाई कम मा घटाउन
3D Scatter → तीनवटा feature एकैसाथ plot गर्न
Heatmap बुझ्ने तरिका¶
उचाइ तौल उमेर marks
उचाइ [ 1.0 0.8 0.3 0.5 ]
तौल [ 0.8 1.0 0.2 0.4 ]
उमेर [ 0.3 0.2 1.0 0.6 ]
marks [ 0.5 0.4 0.6 1.0 ]
→ उचाइ र तौल बीच 0.8 = strong सम्बन्ध!
→ उमेर र marks बीच 0.6 = moderate सम्बन्ध
Code¶
import seaborn as sns
# Heatmap — सबै correlation एकैसाथ
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
# Pair Plot — सबै scatter एकैसाथ
sns.pairplot(df)
# PCA — features घटाउन
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
Use Case¶
- Feature selection — कुन feature राख्ने, कुन हटाउने
- Multicollinearity detect गर्न — धेरै feature एउटै कुरा बताउँछन् कि?
- Pattern खोज्न — data मा लुकेको structure बुझ्न
- Dimensionality Reduction — धेरै features लाई कम गर्न (PCA)
⚖️ तीनवटाको तुलना¶
| Univariate | Bivariate | Multivariate | |
|---|---|---|---|
| Variables | १ वटा | २ वटा | ३+ वटा |
| उद्देश्य | बुझ्नु | सम्बन्ध हेर्नु | Pattern खोज्नु |
| Chart | Histogram, Box | Scatter, Bar | Heatmap, Pairplot |
| प्रश्न | Data कस्तो छ? | सम्बन्ध छ? | सबैको बीच के छ? |
| Example | उचाइ मात्र | उचाइ र तौल | उचाइ+तौल+उमेर+marks |
🎯 सार¶
Data analysis को क्रम:
Step 1 → Univariate : प्रत्येक column बुझ्नु
Step 2 → Bivariate : दुई-दुई column को सम्बन्ध हेर्नु
Step 3 → Multivariate : सबै मिलाएर pattern खोज्नु
↓
Model बनाउन तयार!
💡 नियम: पहिले Univariate गर — data clean छ कि छैन थाहा हुन्छ।
अनि Bivariate — कुन feature important छ थाहा हुन्छ।
अन्तमा Multivariate — सबै मिलाएर deep pattern बुझिन्छ।
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
table = pd.DataFrame(iris.data,columns=iris.feature_names)
table.head()
| sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |
|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 |
table['target'] = iris.target
table.head()
| sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | target | |
|---|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | 0 |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | 0 |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | 0 |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | 0 |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | 0 |
table.shape
(150, 5)
# Setosa matra ko data jun 50 sample xa
setosa=table.loc[table['target']==0]
setosa
| sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | target | |
|---|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | 0 |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | 0 |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | 0 |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | 0 |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | 0 |
| 5 | 5.4 | 3.9 | 1.7 | 0.4 | 0 |
| 6 | 4.6 | 3.4 | 1.4 | 0.3 | 0 |
| 7 | 5.0 | 3.4 | 1.5 | 0.2 | 0 |
| 8 | 4.4 | 2.9 | 1.4 | 0.2 | 0 |
| 9 | 4.9 | 3.1 | 1.5 | 0.1 | 0 |
| 10 | 5.4 | 3.7 | 1.5 | 0.2 | 0 |
| 11 | 4.8 | 3.4 | 1.6 | 0.2 | 0 |
| 12 | 4.8 | 3.0 | 1.4 | 0.1 | 0 |
| 13 | 4.3 | 3.0 | 1.1 | 0.1 | 0 |
| 14 | 5.8 | 4.0 | 1.2 | 0.2 | 0 |
| 15 | 5.7 | 4.4 | 1.5 | 0.4 | 0 |
| 16 | 5.4 | 3.9 | 1.3 | 0.4 | 0 |
| 17 | 5.1 | 3.5 | 1.4 | 0.3 | 0 |
| 18 | 5.7 | 3.8 | 1.7 | 0.3 | 0 |
| 19 | 5.1 | 3.8 | 1.5 | 0.3 | 0 |
| 20 | 5.4 | 3.4 | 1.7 | 0.2 | 0 |
| 21 | 5.1 | 3.7 | 1.5 | 0.4 | 0 |
| 22 | 4.6 | 3.6 | 1.0 | 0.2 | 0 |
| 23 | 5.1 | 3.3 | 1.7 | 0.5 | 0 |
| 24 | 4.8 | 3.4 | 1.9 | 0.2 | 0 |
| 25 | 5.0 | 3.0 | 1.6 | 0.2 | 0 |
| 26 | 5.0 | 3.4 | 1.6 | 0.4 | 0 |
| 27 | 5.2 | 3.5 | 1.5 | 0.2 | 0 |
| 28 | 5.2 | 3.4 | 1.4 | 0.2 | 0 |
| 29 | 4.7 | 3.2 | 1.6 | 0.2 | 0 |
| 30 | 4.8 | 3.1 | 1.6 | 0.2 | 0 |
| 31 | 5.4 | 3.4 | 1.5 | 0.4 | 0 |
| 32 | 5.2 | 4.1 | 1.5 | 0.1 | 0 |
| 33 | 5.5 | 4.2 | 1.4 | 0.2 | 0 |
| 34 | 4.9 | 3.1 | 1.5 | 0.2 | 0 |
| 35 | 5.0 | 3.2 | 1.2 | 0.2 | 0 |
| 36 | 5.5 | 3.5 | 1.3 | 0.2 | 0 |
| 37 | 4.9 | 3.6 | 1.4 | 0.1 | 0 |
| 38 | 4.4 | 3.0 | 1.3 | 0.2 | 0 |
| 39 | 5.1 | 3.4 | 1.5 | 0.2 | 0 |
| 40 | 5.0 | 3.5 | 1.3 | 0.3 | 0 |
| 41 | 4.5 | 2.3 | 1.3 | 0.3 | 0 |
| 42 | 4.4 | 3.2 | 1.3 | 0.2 | 0 |
| 43 | 5.0 | 3.5 | 1.6 | 0.6 | 0 |
| 44 | 5.1 | 3.8 | 1.9 | 0.4 | 0 |
| 45 | 4.8 | 3.0 | 1.4 | 0.3 | 0 |
| 46 | 5.1 | 3.8 | 1.6 | 0.2 | 0 |
| 47 | 4.6 | 3.2 | 1.4 | 0.2 | 0 |
| 48 | 5.3 | 3.7 | 1.5 | 0.2 | 0 |
| 49 | 5.0 | 3.3 | 1.4 | 0.2 | 0 |
# virginica ko sample matra jun 50 sample xa
virginica=table.loc[table['target']==1]
virginica
| sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | target | |
|---|---|---|---|---|---|
| 50 | 7.0 | 3.2 | 4.7 | 1.4 | 1 |
| 51 | 6.4 | 3.2 | 4.5 | 1.5 | 1 |
| 52 | 6.9 | 3.1 | 4.9 | 1.5 | 1 |
| 53 | 5.5 | 2.3 | 4.0 | 1.3 | 1 |
| 54 | 6.5 | 2.8 | 4.6 | 1.5 | 1 |
| 55 | 5.7 | 2.8 | 4.5 | 1.3 | 1 |
| 56 | 6.3 | 3.3 | 4.7 | 1.6 | 1 |
| 57 | 4.9 | 2.4 | 3.3 | 1.0 | 1 |
| 58 | 6.6 | 2.9 | 4.6 | 1.3 | 1 |
| 59 | 5.2 | 2.7 | 3.9 | 1.4 | 1 |
| 60 | 5.0 | 2.0 | 3.5 | 1.0 | 1 |
| 61 | 5.9 | 3.0 | 4.2 | 1.5 | 1 |
| 62 | 6.0 | 2.2 | 4.0 | 1.0 | 1 |
| 63 | 6.1 | 2.9 | 4.7 | 1.4 | 1 |
| 64 | 5.6 | 2.9 | 3.6 | 1.3 | 1 |
| 65 | 6.7 | 3.1 | 4.4 | 1.4 | 1 |
| 66 | 5.6 | 3.0 | 4.5 | 1.5 | 1 |
| 67 | 5.8 | 2.7 | 4.1 | 1.0 | 1 |
| 68 | 6.2 | 2.2 | 4.5 | 1.5 | 1 |
| 69 | 5.6 | 2.5 | 3.9 | 1.1 | 1 |
| 70 | 5.9 | 3.2 | 4.8 | 1.8 | 1 |
| 71 | 6.1 | 2.8 | 4.0 | 1.3 | 1 |
| 72 | 6.3 | 2.5 | 4.9 | 1.5 | 1 |
| 73 | 6.1 | 2.8 | 4.7 | 1.2 | 1 |
| 74 | 6.4 | 2.9 | 4.3 | 1.3 | 1 |
| 75 | 6.6 | 3.0 | 4.4 | 1.4 | 1 |
| 76 | 6.8 | 2.8 | 4.8 | 1.4 | 1 |
| 77 | 6.7 | 3.0 | 5.0 | 1.7 | 1 |
| 78 | 6.0 | 2.9 | 4.5 | 1.5 | 1 |
| 79 | 5.7 | 2.6 | 3.5 | 1.0 | 1 |
| 80 | 5.5 | 2.4 | 3.8 | 1.1 | 1 |
| 81 | 5.5 | 2.4 | 3.7 | 1.0 | 1 |
| 82 | 5.8 | 2.7 | 3.9 | 1.2 | 1 |
| 83 | 6.0 | 2.7 | 5.1 | 1.6 | 1 |
| 84 | 5.4 | 3.0 | 4.5 | 1.5 | 1 |
| 85 | 6.0 | 3.4 | 4.5 | 1.6 | 1 |
| 86 | 6.7 | 3.1 | 4.7 | 1.5 | 1 |
| 87 | 6.3 | 2.3 | 4.4 | 1.3 | 1 |
| 88 | 5.6 | 3.0 | 4.1 | 1.3 | 1 |
| 89 | 5.5 | 2.5 | 4.0 | 1.3 | 1 |
| 90 | 5.5 | 2.6 | 4.4 | 1.2 | 1 |
| 91 | 6.1 | 3.0 | 4.6 | 1.4 | 1 |
| 92 | 5.8 | 2.6 | 4.0 | 1.2 | 1 |
| 93 | 5.0 | 2.3 | 3.3 | 1.0 | 1 |
| 94 | 5.6 | 2.7 | 4.2 | 1.3 | 1 |
| 95 | 5.7 | 3.0 | 4.2 | 1.2 | 1 |
| 96 | 5.7 | 2.9 | 4.2 | 1.3 | 1 |
| 97 | 6.2 | 2.9 | 4.3 | 1.3 | 1 |
| 98 | 5.1 | 2.5 | 3.0 | 1.1 | 1 |
| 99 | 5.7 | 2.8 | 4.1 | 1.3 | 1 |
versicolor=table.loc[table['target']==2]
versicolor
| sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | target | |
|---|---|---|---|---|---|
| 100 | 6.3 | 3.3 | 6.0 | 2.5 | 2 |
| 101 | 5.8 | 2.7 | 5.1 | 1.9 | 2 |
| 102 | 7.1 | 3.0 | 5.9 | 2.1 | 2 |
| 103 | 6.3 | 2.9 | 5.6 | 1.8 | 2 |
| 104 | 6.5 | 3.0 | 5.8 | 2.2 | 2 |
| 105 | 7.6 | 3.0 | 6.6 | 2.1 | 2 |
| 106 | 4.9 | 2.5 | 4.5 | 1.7 | 2 |
| 107 | 7.3 | 2.9 | 6.3 | 1.8 | 2 |
| 108 | 6.7 | 2.5 | 5.8 | 1.8 | 2 |
| 109 | 7.2 | 3.6 | 6.1 | 2.5 | 2 |
| 110 | 6.5 | 3.2 | 5.1 | 2.0 | 2 |
| 111 | 6.4 | 2.7 | 5.3 | 1.9 | 2 |
| 112 | 6.8 | 3.0 | 5.5 | 2.1 | 2 |
| 113 | 5.7 | 2.5 | 5.0 | 2.0 | 2 |
| 114 | 5.8 | 2.8 | 5.1 | 2.4 | 2 |
| 115 | 6.4 | 3.2 | 5.3 | 2.3 | 2 |
| 116 | 6.5 | 3.0 | 5.5 | 1.8 | 2 |
| 117 | 7.7 | 3.8 | 6.7 | 2.2 | 2 |
| 118 | 7.7 | 2.6 | 6.9 | 2.3 | 2 |
| 119 | 6.0 | 2.2 | 5.0 | 1.5 | 2 |
| 120 | 6.9 | 3.2 | 5.7 | 2.3 | 2 |
| 121 | 5.6 | 2.8 | 4.9 | 2.0 | 2 |
| 122 | 7.7 | 2.8 | 6.7 | 2.0 | 2 |
| 123 | 6.3 | 2.7 | 4.9 | 1.8 | 2 |
| 124 | 6.7 | 3.3 | 5.7 | 2.1 | 2 |
| 125 | 7.2 | 3.2 | 6.0 | 1.8 | 2 |
| 126 | 6.2 | 2.8 | 4.8 | 1.8 | 2 |
| 127 | 6.1 | 3.0 | 4.9 | 1.8 | 2 |
| 128 | 6.4 | 2.8 | 5.6 | 2.1 | 2 |
| 129 | 7.2 | 3.0 | 5.8 | 1.6 | 2 |
| 130 | 7.4 | 2.8 | 6.1 | 1.9 | 2 |
| 131 | 7.9 | 3.8 | 6.4 | 2.0 | 2 |
| 132 | 6.4 | 2.8 | 5.6 | 2.2 | 2 |
| 133 | 6.3 | 2.8 | 5.1 | 1.5 | 2 |
| 134 | 6.1 | 2.6 | 5.6 | 1.4 | 2 |
| 135 | 7.7 | 3.0 | 6.1 | 2.3 | 2 |
| 136 | 6.3 | 3.4 | 5.6 | 2.4 | 2 |
| 137 | 6.4 | 3.1 | 5.5 | 1.8 | 2 |
| 138 | 6.0 | 3.0 | 4.8 | 1.8 | 2 |
| 139 | 6.9 | 3.1 | 5.4 | 2.1 | 2 |
| 140 | 6.7 | 3.1 | 5.6 | 2.4 | 2 |
| 141 | 6.9 | 3.1 | 5.1 | 2.3 | 2 |
| 142 | 5.8 | 2.7 | 5.1 | 1.9 | 2 |
| 143 | 6.8 | 3.2 | 5.9 | 2.3 | 2 |
| 144 | 6.7 | 3.3 | 5.7 | 2.5 | 2 |
| 145 | 6.7 | 3.0 | 5.2 | 2.3 | 2 |
| 146 | 6.3 | 2.5 | 5.0 | 1.9 | 2 |
| 147 | 6.5 | 3.0 | 5.2 | 2.0 | 2 |
| 148 | 6.2 | 3.4 | 5.4 | 2.3 | 2 |
| 149 | 5.9 | 3.0 | 5.1 | 1.8 | 2 |
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
plt.plot(setosa['sepal length (cm)'],np.zeros_like(setosa['sepal length (cm)']),'s')
plt.plot(virginica['sepal length (cm)'],np.zeros_like(virginica['sepal length (cm)']),'o')
plt.plot(versicolor['sepal length (cm)'],np.zeros_like(versicolor['sepal length (cm)']),'d')
plt.xlabel('sepal length')
Text(0.5, 0, 'sepal length')
np.zeros_like(setosa['sepal length (cm)'])
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
plt.scatter(table['sepal length (cm)'],table['petal length (cm)'],c=table['target'])
plt.show()
sns.pairplot(table,hue="target", markers=["o", "s", "D"])
plt.show()
C:\Users\s230250\AppData\Local\Programs\Python\Python311\Lib\site-packages\seaborn\axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
--- end -----