--- aliases: - Applied ML, Applied Machine Learning --- Additional resources: [[Python Data Science Handbook.pdf]] [Train Test Split: What it Means and How to Use It | Built In](https://builtin.com/data-science/train-test-split) 1. Based on course: [Machine Learning, Data Science and Deep Learning with Python | Udemy](https://www.udemy.com/course/data-science-and-machine-learning-with-python-hands-on/learn/lecture/15090136#overview) 2. Install python, download [Course Materials: Machine Learning, Data Science, and Deep Learning with Python - Sundog Education with Frank Kane (sundog-education.com)](https://www.sundog-education.com/machine-learning/) 3. Create environment. 1. Open Anaconda shell, point to ML_Projects directory and open Jupyter Notebook ```python cd c:\MLCourse jupyter notebook ``` 3. Load python packages and libraries ```python import numpy #supports sophisticated mathematical functions from pylab import * #module that bulk imports pyplot and numpy from scipy.stats import * #supports scientific and algebraic functions import matplotlib.pyplot #supports plots, charts from scipy import * #supports scientific functions import pandas #supports data analysis, manipulation of numeric tables, checking missing data ``` 1. Load your test, train data From computer file ```python dataframe = pandas.read_csv(r"C:\\ML_Projects\\LinearReg\\test.csv") dataframe.head() #to check top 5 rows of data and the headers ``` - Linear regression (also see [[Common AI ML terms]]) - y=mx + b - Uses least squares method to minimize squared error between each point and the line = minimizes variance - Measure : error with r-squared = fraction of total variation captured by the model (0 to 1, 1 = perfect fit, 0 = none of the variance is captured) 1. Calculate slope, intercept ```python slope, intercept, r_value, p_value, std_err = stats.linregress(X, Y) r_value ** 2 ``` 2. Plot regression line ```python def predict(X): return slope * X + intercept fitLine = predict(X) plt.scatter(X, Y) plt.plot(X, fitLine, c='r') plt.show() ``` - Polynomial regression - Fit data into a curve expressed by polynomial equation - Second order y = ax2 + bx + c - Using higher order may cause overfitting. - Measure : error with r-squared but might not be accurate if overfitting training data - Multiple regressions E.g. predict car price based on body style, brand, mileage price of car = a + b1 x “mileage” + b2 x “no. of cylinders” + b3 x “no. of doors” — Note we are avoiding ordinal data like “brand”, “model” here that won’t work with mixing numerical data in regressions. Features: Mileage, brand, doors **need to be normalized first**. Eliminate features that don’t matter (when b1, b2 approaches zero). Measure : Fit with r-squared Assumption : Features are assumed to be independent of each other. *regressions don't work well with ordinal values, unless you can convert them into some numerical order that makes sense somehow.