iam completed movie lens project till
3) determine the feature effecting the particular rating as showed in the below screenshot
i was been strucked in no 3) question as below screenshort plz kindly help me out of this wt the nxt step to do and also how to find the 4) develop the appropriate model to predict the movie rating, help me by the coding formates

Aug 24 - Sep 28 Batch - Assignment 1
Correct me to arrange this properly.

i = 4
while i > 0:
print("#")
j = i
while j > 0:
print("1")
j -= 1
print("\n")
i -= 1
print(""*i)

o/p:

#
1
1
1
1

#
1
1
1

#
1
1

#
1

Hello Mam,

Can you please assist me how this is working a[-1:-2:-1, :]
-----------------------a----
array([[ 0. , 1.11111111, 2.22222222],
[ 3.33333333, 4.44444444, 5.55555556],
[10. , 7.77777778, 8.88888889]])

Hi,

Since this is a multi-class classification problem, you need to use chisquare test of independence. All the files are updated in the google drive. Please open the statistics folder in the google drive to look for the chisquare method of feature selection.

Regards,
Samridhi

Hi Samridhi,

My question is I am not able to understand features affecting means what features?
Also really struck and confused to solve these 2 questions so please assist me

how to slice row ? when loc and isin is also used
brics.loc[1:3,brics.columns.isin(['capital','area'])]

Hi Samridhi,

users.dat has following data format

1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455
6::F::50::9::55117

From the above data set I can understand the first column is User_id, next column is gender, next column is age. Last two columns I am not able to understand. Can you explain other fields?

Similarly ratings.dat has following format

1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291

From the above data set I can't understand any of the field. Can you explain the fields?

Regards,
Pranaya

8. ### PRASHANT NAMDEO SHELARE Member

Joined:
Apr 5, 2019
Messages:
2
0
Hello Ma'am,

I was doing Movielens project and I was not getting appropriate result after doing
pd.concat()....The output is showing
(19266, 28)
Actual I had took 10000 records for analysis i.e (10000,10) and one hot encoding on unique_genres which also of
I am sending my full code in pdf format... Download it and rename the extension as .ipynb

hello

I am having some doubts relate to Building user-based recommendation model for Amazon project.

There are so many NaN values presented in each Movie columns, how to i replace Nan value

i just followed this way :

Reduced_df=AMTV_Ratind_df.loc[AMTV_Ratind_df.columns.notnull(),AMTV_Ratind_df.columns]
for i in range(0,206):
Reduced_df.loc[Reduced_df[Reduced_df.columns].isnull(),Reduced_df.columns]=0

But , if i replace Nan values with 0, MEan and Median are changes . how can i solve this

Hi Prashant,

pd.concat is not working here, because indices of the 2 dataframes are not same. So, you need to reset the index column and put common indices in both the dataframes, and then do the column-wise concatenation.

Regards,
Samridhi

Hi Guru,

For this, don't consider the movies rated NaN, while computing the average rating.

Regards,
Samridhi

Hi Samridhi,

Can you please let me know the following?

a)How to Set up working directory? Ie., I downloaded Anaconda and installed. Launched Jupyter Notebook from Anaconda Navigator. How to setup working directory?

b) What does Files, Running, Clusters mean inside Jupyter Notebook? How to create a new directory and set it as default working directory?

c) What are Terminals and Notebooks? How to open Terminals and Notebooks? How many terminals and notebooks can be opened at a time?

d) How many Python windows or instances cane be opened at a time? What is the technical name of a Python Window?

Thanks
Shyam

Hi Samridhi,

Webex is not allowing to join today (11th Oct) class. Later it showed session is not started. Is there a class today? I have sent an email to support team to provide access. Can you please share today's class recordings in google drive?

Thanks
Shyam

Hi Samridhi
Couple of questions on the movielens project.
1. User Age Distribution
a: Once we get the age distribution with the value_count method, how do we plot that the same in a histogram/bar graph?
b: How do I convert a value.count method into a pandas dataframe?

Features affecting the ratings, is this an interpretation from the Data or do we need to code something here?

Model building, what needs to be done here?

Any particular cheatsheet we have on brackets? Like which brackets to use and when?It's one of the part which I find quite confusing.

Hello ma'am
How can I access internet related problems/network issues from the column 'Customer Complaint'
which has a string of words and please tell me how can i access records based monthly from column 'Date_month_year' and daily from date column

1. User Age Distribution
a: Once we get the age distribution with the value_count method, how do we plot that the same in a histogram/bar graph?
- use matplotlib library as discussed in data visualization class
b: How do I convert a value.count method into a pandas dataframe?
- use pd.DataFrame function

Features affecting the ratings, is this an interpretation from the Data or do we need to code something here?
- we need to use feature selection techniques like anova / chisq / lin re / log re

Model building, what needs to be done here?
Create prediction models like lin re / log re /knn

Any particular cheatsheet we have on brackets? Like which brackets to use and when?It's one of the part which I find quite confusing.[/QUOTE]

Regarding brackets:
[] - represents lists
()- tuples
{} - dictionaries / sets

[] - also used for subsetting / slicing data structures
() - also used to enclose function arguments

[/QUOTE]

Hi Aakarsh,

Please use regular expressions to search for the search string. Please refer to pandas df session. There we have discussed how to extract some search string using regular expressions.

Hi Samridhi
I have submitted the project. I have zipped the ipynb file and attached it in the submit section. Is there anything else I need to do? Please let me know. I have attached the here as well for reference.

In Movie lens projects, for feature engineering for 'Find out all the unique genres (Hint: split the data in column genre making a
list and then process the data to find out only the unique categories of genres)' I'm scripting as below and get error.
Can you please let me know what I am doing wrong?

MovieLensData.Genres = MovieLensData.Genres.str.split("|") # where MovieLensData is my master data

Error
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-119-c3c47e3a006b> in <module>
----> 1 MovieLensData.Genres = MovieLensData.Genres.str.split("|")

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5061 if (name in self._internal_names_set or name in self._metadata or
5062 name in self._accessors):
-> 5063 return object.__getattribute__(self, name)
5064 else:
5065 if self._info_axis._can_hold_identifiers_and_holds_name(name):

~\Anaconda3\lib\site-packages\pandas\core\accessor.py in __get__(self, obj, cls)
169 # we're accessing the attribute of the class, i.e., Dataset.geo
170 return self._accessor
--> 171 accessor_obj = self._accessor(obj)
172 # Replace the property with the accessor object. Inspired by:
173 # http://www.pydanny.com/cached-property.html

~\Anaconda3\lib\site-packages\pandas\core\strings.py in __init__(self, data)
1794
1795 def __init__(self, data):
-> 1796 self._validate(data)
1797 self._is_categorical = is_categorical_dtype(data)
1798

~\Anaconda3\lib\site-packages\pandas\core\strings.py in _validate(data)
1816 # (instead of test for object dtype), but that isn't practical for
1817 # performance reasons until we have a str dtype (GH 9343)
-> 1818 raise AttributeError("Can only use .str accessor with string "
1819 "values, which use np.object_ dtype in "
1820 "pandas")

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

1. As requested by trainer Samridhi to post problem discription to print series 1, 11, 21, 1211, 111221,312211, ...........
1 -------------Take 1 as input [now how many 1s we have count from left -> one one -> so print 11]
11 -----------Take 11 as input now [ now how many 1s we have count from left -> two ones-> so print 21]
21 -----------Take 21 as input now [now how many 2s in 1st place from left -> one two and how many ones after it -> one one -> so print 1211]
1211 ------- Take 1211 as input now and repeat as above

PFA Java code for above is uploaded as file

2nd Nov-7th Dec batch
this is the code for printing 1s-
for idx in range(4):
n=4
while n>=0:
lst=[]
for num in range(n):
lst.append(str(1))
print("".join(lst).rjust(4))
n=n-1

T
Thanks Venkatarao!

Hi Samridhi,

I have got an accuracy of about 1.8214 using MSE while I tried to get predictions using Linear Regression for the assignment 'Evaluate the Ad Budget Dataset of XYZ Firm'.
Is this accuracy figure is in an acceptable range?

for the movielens project, since there are so many categorical variables, should we not model based on logistic regression ?

Hi Ashik,

Please calculate the R2 value, and check accuracy in percentage.

Regards,
Samridhi

Yes you can use logistic regression or knn.

Regards,
Samridhi

Hi, I m pursuing a machine learning course and want to know what are the machine learning applications that are helpful in market research or data mining.
Please, anyone, help me out for this.

Hi,

For the titanic data set, I have assigned as df.sex=male by mistake , in that case how could i correct it so that i can have the original data itself

Statistics for data science
Statistics for data science

Can someone please help me understand why the count of True is coming out to be 3?

Assignment #1
1 1 1 1 1
_ 1 1 1 1
_ _ 1 1 1
_ _ _ 1 1
_ _ _ _ 1

CODE:
number = int(input('Enter the number of rows:'))

for i in range(number):
for j in range(number):
if j >= i:
print(1, end = '')
else:
print(' ', end = '')
print('\n')

Unable to see any live classes today 5/16/2020

Not
Ticket number: 00570870

Hi Samridhi Ma'am,

Kindly note that, I'm not able to receive any file which you shared with G-Drive.

Regards,

Hi Samridhi ma'am,

The path " %config IPCompleter.greedy = True " for numpy arrays in python is not working.

Thank You.

Regards,
Samridhi

in line 40, we gave the input as a raw string while in line 4 we didn't. Both are producing the same result. Can someone please let me know the difference between the two?

Hi Samridhi Ma'am,

Unable to open lab. The screen shows error
Debug error: Expired timestamp, yours 1590155715, ours 1590154975

Hi,

You are expected to download the contents of the google drive within the same week / month. After that we delete, in order to free up space. Anyway, you can access the contents from my other batches in the following location:

Regards,
Samridhi

for clarification 1:
for x in range(len(test)):
print(type(test[x]))

for

for clarifications 1 & 3:

for x in range(len(test)):
print(type(test[x]))
print("lenght of " + str(type(test[x]))+ " is "+str(len(test[x])))

Hello Ma'am,

And also the following statements are not working :
import pandas as pd
import numpy as np

Regards,

Hi Samridhi i have few questions regarding movilens project.
1. I want to do a pivot table to understand age distribution. I can use User ID as values for aggregate function. But i am not geting correct code for it. When using group by the following codes are working
Master_Data_2.groupby('Age Group')[['UserID']].nunique()
Master_Data_2.groupby('Gender')[['UserID']].nunique()
Master_Data_2.groupby('Occupation')[['UserID']].nunique()

where as using pivot table nunique is throwing an error
2. In the session I asked how to split Movie Name column into two columns: Movie and Year. I tried it using str.extract and regular expression.The output only solved half of the problem. As you can see in figure that year column is as expected but the movie column is unsatisfactory.

Hello Ma'am,
I m doing Walmart Project. In last question how to build linear regression model And also want to know which linear regression model is fitted ?

Regards,

hii,
mam how to find percentage of complaints resolved till date, which were received through the Internet and customer care calls in comcast project ?

Hi Samridhi,

while working on walmart project,

Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order).

Y-Intercept:
139639575.33440715

Root Mean Square Error(rmsd)
552459.8953123686

R^2 Value:
0.03433024544837493

x=walmart.drop(['Weekly_Sales','Store','Date','Semester'],axis=1)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state= 0)

please help me with that R square value is not close to 1, which is not good fit for model.

