Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Data Science with Python- Feb 06 - March 21, 2021: Billy

Hello everyone..
Small query..
I have attended yesterday's ( 7th Feb.) class but still it is showing me that i have missed that..
Any of you is facing the same issue?
 
Excellent question:
First each algorithm is different.
And it takes understanding the algorithm itself as well as its signature
to understand how you can impact each different algorithm using hyper-parameters available to the algorithm

Specific example: K-means clustering
This clusters like observations around centroids (the center of the cluster)
This algorithm starts with randomly selected centroids

Then iteration calculates distance to every other o
How these iterative calcs are done will impact completxity
Centroid position change does not exceed predefined threshold, or
The algorithm surpassed a maximum number of iterations

Because each interation computes distance between each
observation and all other observations its complexity = is O(n²)


O (n * K * I * d) n
n = number of observations
K = number of clusters
I = max iterations
d = number of features in an observation

By default the interation calculates silhouette score
Distance between each observation and all other observations

However you can alter the model's complexity in this case by using (as one exemple)
the elbow method which measures (inertia) within-cluster sum-of-squares as input
Pros: not as expensive as calculating silhouette score
Is included in algorithm and can be set with the hypermater "tol"
 
Arsh:

The above was an incomplete response. I lost power half way through editing it
and the results are choppy. Here is a more complete and hopefully
more comprehensive reply.

Excellent question:
First each algorithm is different.
It takes both understanding the algorithm’s implementation as well as its signature
(available hyper-parameters that alter the algorithm's default behavior)

Specific example: K-means clustering
This clusters like observations around centroids (the center of a cluster)
The algorithm starts with randomly chosen positions for the centroids

By default each iteration calculates distance from current observation to all others
which makes the complexity = is O(n²)

O (n * K * I * d) n
n = number of observations
K = number of clusters
I = max iterations
d = number of features in an observation

However by increasing the "tol" (tolerance) threshold
the algorithm uses the elbow method which performs measurements differently
and is not as expensive as the default which may impact accuracy

Summary:
Understand the algorithms implementation
Understand alternate algorithms for problem
Understand how to impact complexity using hyper-parameters vs the impact hyper-parameters
values may have on model performance

Hope that helps.
Sincerely,
Billy
 
Thank you very much for all your feedback. It has been very constructive and I will
refer to it diligently and respectfully.
Sincerely,
Billy
 
Gobinda,

If you are using anaconda choose Jupyter Lab instead of Jupyter Notebook.
Jupyter Lab is self-defined as being in beta without stability guarantees.

There are extensions for the stable (non Lab) Jupyter release
which can be installed:
pip install jupyter_contrib_nbextensions
conda install -c conda-forge jupyter_contrib_nbextensions

While these extensions do allow you a left sidebar for TOC
I have not come across a left sidebar for viewing a directory.

Of course clicking the File>Open will auto-open a directory view,
but that was not your question.

Try Jupyter Lab for what you want. Save often :)

Sincerely,
BIlly
 
Top