### Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

# Machine Learning Advanced Certification | April 19 - May 7

#### Karthickeyan S

##### Moderator
Simplilearn Support

#### T E Mohanasundari

##### Member
Hello sir,

the answer to the question, print all the odd numbers between 1 to 100 and leave every third number is as follows:
**********
a = list(map(lambda x: x, range(1,101,2)))
print(a)

for ele in sorted(range(2,len(a),3), reverse = True):
del a[ele]

print (*a)
*********

I got the final output as

1 3 7 9 13 15 19 21 25 27 31 33 37 39 43 45 49 51 55 57 61 63 67 69 73 75 79 81 85 87 91 93 97 99

We can check and discuss on this in tomorrow class.

Thanks,
Mohana.

#### Nirmal Chandra Dash

##### Member
Problem Statement : Print the odd number from list 1 to 100 skipping every third number

[x for x in range(1,101,2) if x not in list(filter(lambda x:x,range(1,101,2)))[2::3]]

Another way:

[(lambda x:x)(x) for x in range(1,101,2) if x not in list(filter(lambda x:x,range(1,101,2)))[2::3]]

#### Nirmal Chandra Dash

##### Member
Problem Statement : if your name matches 'h' then reduce age by 10.

df.loc[df['Name']=='h','age']=df['age']-10

#### Nirmal Chandra Dash

##### Member
problem Statement :
Mtcars, an automobile company in Chambersburg, United States has recorded the production of its cars within a dataset. With respect to some of the feedback given by their customers they are coming up with a new model. As a result of it they have to explore the current dataset to derive further insights out if it.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

dataframe.info()

dataframe.describe(include='O')
dataframe.describe()

np.mean(dataframe['hp'])

df_corr=dataframe.corr()
df_corr

#### Correlated Features​

mpg - cyl --> Highly Negative Corrleated

mpg - disp --> Highly Negative Corrleated

mpg - hp --> Highly Negative Corrleated

mpg - wt --> Highly Negative Corrleated

#### Nirmal Chandra Dash

##### Member
Problem Statement: Load the load_diabetes datasets internally from sklearn and check for any missing value or outlier data in the ‘data’ column. If any irregularities found treat them accordingly.

print(diabetes.DESCR)

df=pd.DataFrame(data=diabetes.data,columns=diabetes.feature_names)
df['Target']=diabetes.target

df.shape
df.info()

#Missing value (No Missing Value)

df.isna().sum().sort_values(ascending=False)

for feature in df.columns:
sns.boxplot(df[feature])
plt.show()

#### #These feature having outlier​

#bmi --> >0.12

#s6 --> < -0.10 and >0.12

#s5 --> >0.12

#s4 --> >0.14

#s3 --> >0.12

#s2 --> >0.11

#s1 --> >0.11

df_new=df[~((df['bmi']>0.12)|((df['s6']< -0.10) | (df['s6']>0.12))|(df['s5']>0.12)|(df['s4']>0.14)|(df['s3']>0.12)|(df['s2']>0.11)|(df['s1']>0.11))]

for feature in df_new.columns:
sns.boxplot(df_new[feature])
plt.show()

#### John Brian Rodrigues

##### New Member
Notebook prints (PDF) attached as worked on 'mtcars' and 'diabetes' datasets.

#### Attachments

• ML_mtcars_20210420_JBR.pdf
191.4 KB · Views: 5
• ML_diab_20210420_JBR.pdf
215.1 KB · Views: 7

#### Loganathan Kumarasamy

##### Customer
Customer
Problem Desc:

Mtcars, an automobile company in Chambersburg, United States, has recorded the production of its cars within a dataset. The company is coming up with a new model based on the feedback given by its customers. It has to explore the current dataset to derive further insights from it.

Objective: Import the dataset, explore for dimensionality, and type and average value of the horsepower across all the cars. Also, identify a few of mostly correlated features, which would help in modification.

import pandas as pd

df_mtcars

df_mtcars.shape
df_mtcars.describe()

df_mtcars.dtypes
df_mtcars.mean()

df_mtcars['hp'].mean()

import seaborn as sns
sns.heatmap(df_mtcars.corr())

Highly correlated features:

cyl and disp
wt and hp
hp and cyl

#### Rajendra Perumal

##### Member
Outliers for Diabetes Dataset

#### Attachments

• Diabetes.pdf
62.5 KB · Views: 6

#### Rajendra Perumal

##### Member
Outliers for mtcars Dataset

#### Attachments

• mtcars.pdf
33.9 KB · Views: 5

#### T E Mohanasundari

##### Member
Problem 1 : mtcars

import pandas as pd
import seaborn as sns

df_mtcars.shape
df_mtcars.dtypes

df_mtcars['hp'].mean()

sns.heatmap(df_mtcars.corr());

Results : a) (32, 12) - shape

b)
model object
mpg float64
cyl int64
disp float64
hp int64
drat float64
wt float64
qsec float64
vs int64
am int64
gear int64
carb int64
dtype: object

c) 146.6875 - hp average

d) some of the correlation results
mpg is negatively correlated with cyl, disp, hp, carb,wt
cyl is negatively correlated with vs, mpg, drat
hp is negatively correlated with vs, qsec and mpg

#### T E Mohanasundari

##### Member

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

df_diabetes = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)

df_diabetes.isna().any()

O/P
age False
sex False
bmi False
bp False
s1 False
s2 False
s3 False
s4 False
s5 False
s6 False
dtype: bool

import seaborn as sns

for keys in df_diabetes.columns:
sns.boxplot(df_diabetes[keys])
plt.show()

# Outliers are there for bmi, s1, s2, s3, s4, s5 and s6

dict_new = {"s4" : 0.145, "s5" : 0.125, "s3" : 0.12, "s2" : 0.11, "bmi" : 0.125}

for keys in dict_new:
filter_new = df_diabetes[keys] < dict_new.get(keys)
df_diabetes_new = df_diabetes[filter_new]
sns.boxplot(df_diabetes_new[keys])
plt.show()

#For the columns s6 and s1, we have outliers on both sides so separately done, still working on it to bring this under the for loop.

filter_s6 = (df_diabetes['s6'] < 0.115) & (df_diabetes['s6']>-0.12)
df_diabetes_s6 = df_diabetes[filter_s6]
sns.boxplot(df_diabetes_s6['s6']);
plt.show()

filter_s1 = (df_diabetes['s1'] < 0.11) & (df_diabetes['s1']>-0.12)
df_diabetes_s1 = df_diabetes[filter_s1]
sns.boxplot(df_diabetes_s1['s1']);
plt.show()

#### Ashish Rai_2

##### Member
MTCARS data problem

# dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# dataset import

# dataset info > shape
print("Shape of dataset:", mtcars.shape)
# dataset info missing values > True = Missing values, False -> No missing value
print("Missing values:", mtcars.isna().any().any())

# distinct number of cylinders in car models
print("Number of distinct car cylinders: ", mtcars['cyl'].unique())

# statisitcal summary of car milage "mpg" for distinct car cylinders "cyl"
print("Statical summary of car milage for distinct number of car cylinders.")
print(mtcars.groupby(['cyl'])['mpg'].agg('describe'))

# correlation plot > heatmap
plt.figure(figsize=(8,8))
sns.heatmap(mtcars.corr(),
square=True, fmt='.1g', annot=True, cmap='RdBu_r')
plt.show()

# Distribution of car milage "mpg" provided number of car cylinders "cyl" > kde plot
fig, ax = plt.subplots(1, 1)
for n, cyl in enumerate(np.sort(mtcars['cyl'].unique())):
sns.kdeplot(mtcars[mtcars['cyl'] == cyl]['mpg'], label=cyl)
ax.axes.get_yaxis().set_visible(False)
plt.legend()
plt.show()

# Relationship between weight "wt" and milage "mpg" for car models
plt.figure(figsize=(5,5))
sns.scatterplot(x='wt', y='mpg', data=mtcars)
plt.show()

# OBSERVATIONS
# - The dataset is tidy consists of 12 features for 32 cars models.
# - There are no missing values.
# - There are three types of car cylinders with reference to number of cylinders:
# - 4 cylinders
# - 6 cylinders (Least common)
# - 8 cylinders (Most common)
# - The heatmap plot reveals the most correlated features with car milage (mpg)
# - cyl (# of cylinders): High negative correlation (-0.9)
# - wt (weight of car in 1000 lbs): Hight negative correlation (-0.9)
# - There is an inverse relationship between car milage & number of cylinders along with car weight.

Last edited:

#### Alan Walter Bauzá

##### New Member
MTCARS

# Importing pandas and reading the file
import pandas as pd
df_mtcars = pd.read_csv(r"C:\Users\alanw\Jupyter\3. Machine Learning\first week\mtcars.csv")

# Analyzing the dimensions of the dataframe
df_mtcars.shape

(32, 12)

# Checking the values of the columns to be able to filter them later
df_mtcars.columns.unique()

Index(['model', 'mpg', 'cyl', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am',
'gear', 'carb'],
dtype='object')

# Showing a table with the average hp per car and ordering it by descending hp value
df_mtcars.groupby(['model']).mean().filter(['hp']).sort_values('hp', ascending=False)

modelhp
Maserati Bora335
Ford Pantera L264
Camaro Z28245
Duster 360245
Chrysler Imperial230
Lincoln Continental215
Merc 450SLC180
Merc 450SL180
Merc 450SE180
Pontiac Firebird175
Ferrari Dino175
AMC Javelin150
Dodge Challenger150
Merc 280123
Merc 280C123
Lotus Europa113
Hornet 4 Drive110
Mazda RX4110
Mazda RX4 Wag110
Volvo 142E109
Valiant105
Toyota Corona97
Merc 23095
Datsun 71093
Porsche 914-291
Fiat 12866
Fiat X1-966
Toyota Corolla65
Merc 240D62
Honda Civic52

# Printing correlation between columns
df_mtcars.corr()

mpgcyldisphpdratwtqsecvsamgearcarb
mpg1.000000-0.852162-0.847551-0.7761680.681172-0.8676590.4186840.6640390.5998320.480285-0.550925
cyl-0.8521621.0000000.9020330.832447-0.6999380.782496-0.591242-0.810812-0.522607-0.4926870.526988
disp-0.8475510.9020331.0000000.790949-0.7102140.887980-0.433698-0.710416-0.591227-0.5555690.394977
hp-0.7761680.8324470.7909491.000000-0.4487590.658748-0.708223-0.723097-0.243204-0.1257040.749812
drat0.681172-0.699938-0.710214-0.4487591.000000-0.7124410.0912050.4402780.7127110.699610-0.090790
wt-0.8676590.7824960.8879800.658748-0.7124411.000000-0.174716-0.554916-0.692495-0.5832870.427606
qsec0.418684-0.591242-0.433698-0.7082230.091205-0.1747161.0000000.744535-0.229861-0.212682-0.656249
vs0.664039-0.810812-0.710416-0.7230970.440278-0.5549160.7445351.0000000.1683450.206023-0.569607
am0.599832-0.522607-0.591227-0.2432040.712711-0.692495-0.2298610.1683451.0000000.7940590.057534
gear0.480285-0.492687-0.555569-0.1257040.699610-0.583287-0.2126820.2060230.7940591.0000000.274073
carb-0.5509250.5269880.3949770.749812-0.0907900.427606-0.656249-0.5696070.0575340.2740731.000000

The three lowest correlations (we could say there is almost no correlation) are:
1) am and carb 0.057534
2) drat and carb -0.090790
3) drat and qsec 0.091205

There three highest correlations are:
1) cyl and disp 0.902033
2) disp and wt 0.887980
3) mpg and wt -0.867659

#### SUSHANTH S

##### Member
Problem 1: mtcars

import pandas as pd

df.shape

Out[4]:
modelmpgcyldisphpdratwtqsecvsamgearcarb
0Mazda RX421.06160.01103.902.62016.460144
1Mazda RX4 Wag21.06160.01103.902.87517.020144
2Datsun 71022.84108.0933.852.32018.611141
3Hornet 4 Drive21.46258.01103.083.21519.441031
In [5]:
df.dtypes

Out[5]:
model object
mpg float64
cyl int64
disp float64
hp int64
drat float64
wt float64
qsec float64
vs int64
am int64
gear int64
carb int64
dtype: object
In [6]:
df['hp'].mean()

Out[6]:
146.6875
In [7]:
import seaborn as sns

In [8]:
sns.heatmap(df.corr())

Out[8]:
<AxesSubplot:>

1. cyl is negatively correlated with drat, vs and mpg
2. wt is negatively correlated with drat, mpg and am
3. hp is negatively correlated withmpg,qsec and vs

Problem 2: Diabetes

import pandas as pd
import numpy as np

In [7]:

In [9]:
print(df.DESCR)

.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

:Number of Instances: 442

:Number of Attributes: First 10 columns are numeric predictive values

:Target: Column 11 is a quantitative measure of disease progression one year after baseline

:Attribute Information:
- age age in years
- sex
- bmi body mass index
- bp average blood pressure
- s1 tc, T-Cells (a type of white blood cells)
- s2 ldl, low-density lipoproteins
- s3 hdl, high-density lipoproteins
- s4 tch, thyroid stimulating hormone
- s5 ltg, lamotrigine
- s6 glu, blood sugar level

Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).

Source URL:

Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)

In [12]:
df1=pd.DataFrame(data=df.data,columns=df.feature_names)

In [13]:

Out[13]:
agesexbmibps1s2s3s4s5s6
00.0380760.0506800.0616960.021872-0.044223-0.034821-0.043401-0.0025920.019908-0.017646
1-0.001882-0.044642-0.051474-0.026328-0.008449-0.0191630.074412-0.039493-0.068330-0.092204
20.0852990.0506800.044451-0.005671-0.045599-0.034194-0.032356-0.0025920.002864-0.025930
3-0.089063-0.044642-0.011595-0.0366560.0121910.024991-0.0360380.0343090.022692-0.009362
40.005383-0.044642-0.0363850.0218720.0039350.0155960.008142-0.002592-0.031991-0.046641
In [14]:
df1.shape

Out[14]:
(442, 10)
In [15]:
df1.dtypes

Out[15]:
age float64
sex float64
bmi float64
bp float64
s1 float64
s2 float64
s3 float64
s4 float64
s5 float64
s6 float64
dtype: object
In [18]:
import seaborn as sns

In [20]:
for keys in df1.columns:
sns.boxplot(df1[keys])
plt.show()

In [ ]:

#### DHARANI KANNA K V

##### Member
dict_new = {"s4" : 0.145, "s5" : 0.125, "s3" : 0.12, "s2" : 0.11, "bmi" : 0.125}
Hi Mohanasundhari
How you got this values to create this dictionary. Can you please explain me.

#### DHARANI KANNA K V

##### Member
MTCARS problem -> Check for missing values and outliers within the horsepower column remove them.

Python:
``````import pandas as pd
import seaborn as sns
from sklearn.impute import SimpleImputer
import numpy as np
%matplotlib inline

print("Files Imported Sucessfully...!")

description = mtcars.describe()
#print(description)

data_types = mtcars.dtypes
#print(data_types)

missing_values = mtcars.isna().any()
#print(missing_values)

#sns.boxplot(mtcars['hp'])

filt = mtcars["hp"].values<300

mtcars_filt = mtcars[filt]

sns.boxplot(mtcars_filt['hp'])[/CENTER]``````

#### Attachments

• 1.png
3.1 KB · Views: 0
• 2.png
2.8 KB · Views: 0

#### Ashish Rai_2

##### Member
DIABETES data problem

# Working with sklearn's diabetes dataset

# dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# view

# data info > shape
print("\nShape of diabetes features data: ", diabetes.shape)

# data info > missing data (True: Missing data; False: No missing data)
print("\nMissing data: ", diabetes.isna().any().any())

# statistical summary > describe
print("\nStatistical summary")
print(round(diabetes.describe().T, 2))

# statistical summary > correlation
plt.figure(figsize=(7, 7))
sns.heatmap(diabetes.corr(), square=True, fmt='.2g', annot=True, cmap='RdBu_r')
plt.title("Correlation in diabetes dataset")
plt.show()

# relationship between "target" & correlated features (bmi & s5)
fig , axes = plt.subplots(1, 2, figsize=(10, 5), sharey=True)
sns.scatterplot(x='bmi', y='target', data=diabetes, ax=axes[0])
axes[0].set_title("Relationship between BMI & diesease progression")
sns.scatterplot(x='s5', y='target', data=diabetes, ax=axes[1])
axes[1].set_title("Relationship between lamotrigine & disease progression")
plt.tight_layout()
plt.show()

# outlier detection
for n, col in enumerate(diabetes.columns):
plt.figure(figsize=(5, 1))
sns.boxplot(data=diabetes, x=col)
plt.show()

# OBSERVATIONS
# - Target variable (i.e disease progression) has **positive moderate correlation** with following features:
# - bmi (Body Mass Index): positive correlation (0.59)
# - s5 (lamotrigine): positive correlation (0.57)
# - The problem of collinearity could be due to following correlated features:
# - s1 & s2 (0.9)
# - s3 & s4 (-0.74)
# - s2 & s4 (0.66)
# - s5 & s5 (0.62)
# - The following columns have outliers present:
# - bmi
# - s1
# - s2
# - s3
# - s4
# - s5
# - s6

#### DHARANI KANNA K V

##### Member
Load_Diabetes -> Perform missing value and outlier data treatment.

Python:
``````import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.impute import SimpleImputer
import numpy as np
%matplotlib inline

print("Files Imported Sucessfully...!")

#print(diabetes.DESCR)
df=pd.DataFrame(data=diabetes.data,columns=diabetes.feature_names)

df.isna().any()

for col in df.columns:
fil=df[col].values<-12
df_new=df[fil]
sns.boxplot(df[col])
plt.show()``````

Last edited:

#### Nirmal Chandra Dash

##### Member
Problem Statement :
SFO Public Department - referred to as SFO has captured all the salary data of its employees from year 2011-2014. Now in 2018 the organization is facing some financial crisis. As a first step HR wants to rationalize employee cost to save payroll budget. You have to do data manipulation and answer the below questions:

1. How much total salary cost has increased from year 2011 to 2014?
2. Who was the top earning employee across all the years?

#### Attachments

• Untitled.pdf
46.8 KB · Views: 13

#### Nirmal Chandra Dash

##### Member
Problem Statement :
SFO Public Department - referred to as SFO has captured all the salary data of its employees from year 2011-2014. Now in 2018 the organization is facing some financial crisis. As a first step HR wants to rationalize employee cost to save payroll budget. You have to do data manipulation and answer the below questions:

1. How much total salary cost has increased from year 2011 to 2014?
2. Who was the top earning employee across all the years?

#### Attachments

• Untitled (1).pdf
48.2 KB · Views: 11

#### Ashish Rai_2

##### Member
SFO Public Department Problem

# dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# view data

# increase in TotalPay by Years
salary_pay_by_year = salary.groupby('Year').sum()[['TotalPay']].reset_index()
sns.lineplot(x='Year', y='TotalPay', data=salary_pay_by_year)
plt.title("Total pay change by year")
plt.show()

# top earning employee across all the years
salary_top_pay = pd.DataFrame(salary_top_pay).reset_index()
sns.barplot(x='EmployeeName', y='TotalPay', data=salary_top_pay)
plt.title("Top 5 paid employees")
plt.show()

#### Rahul Rajkumar Shah

##### Member
Data Science - Machine Learning Course- As discussed, I have attached three assignments given in Lesson - Data Preprocessing-

#### Attachments

• SFOPublicDapartment_Assignement.pdf
49.6 KB · Views: 4
• DataExploration_mtcars_Assignment.pdf
68.2 KB · Views: 3
• DataExploring_Diabetes_Assignment.pdf
137.5 KB · Views: 2

#### Ashish Rai_2

##### Member
Lesson-end Project :: Lesson 3

# dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# DataFrame
data = pd.DataFrame({
"first_name": ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
"last_name": ["Miller", "Jacobson", ".", "Milner", "Cooze"],
"age": [42, 52, 36, 24, 73],
"preTestScore": [4, 24, 31, ".", "."],
"postTestScore": ["25,000", "94,000", 57, 62, 70]
})

# Save the data frame into a .csv file as project.csv
data.to_csv("project.csv", index=False)

# Read the project.csv file and print the data frame.

# Read the project.csv file and make two index columns, namely, ‘First Name’ and ‘Last Name’.

# Print the data frame in a Boolean form as True or False.
# True for Null/ NaN values and false for non-null values.
print(project.isna())

# Read the data frame by skipping the first 3 rows and print the data frame.
print(project)

#### T E Mohanasundari

##### Member
Hi Mohanasundhari
How you got this values to create this dictionary. Can you please explain me.
Hi Dharani,

I performed the outlier separately for each column and found the value.

Regards,
Mohana.

#### T E Mohanasundari

##### Member
Problem Statement :
SFO Public Department - referred to as SFO has captured all the salary data of its employees from year 2011-2014. Now in 2018 the organization is facing some financial crisis. As a first step HR wants to rationalize employee cost to save payroll budget. You have to do data manipulation and answer the below questions:

1. How much total salary cost has increased from year 2011 to 2014?
2. Who was the top earning employee across all the years?

#### Attachments

• Salaries_assign.pdf
25.1 KB · Views: 9

#### Ong Min Hau

##### Member
Problem Statement :
SFO Public Department - referred to as SFO has captured all the salary data of its employees from year 2011-2014. Now in 2018 the organization is facing some financial crisis. As a first step HR wants to rationalize employee cost to save payroll budget. You have to do data manipulation and answer the below questions:

1. How much total salary cost has increased from year 2011 to 2014?
2. Who was the top earning employee across all the years? - I use sort_values to find out the top earning employee, is it possible to use .max() function? I tried it but it could only show its value, unable to retrieve the employee's name.

#### Attachments

• Salaries.pdf
28 KB · Views: 7

#### Ong Min Hau

##### Member
Problem Statement:raw data:
"first_name": ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
"last_name": ["Miller", "Jacobson", ".", "Milner", "Cooze"],
"age": [42, 52, 36, 24, 73],
"preTestScore": [4, 24, 31, ".", "."],
"postTestScore": ["25,000", "94,000", 57, 62, 70]
})

#### Attachments

• Test_Score.pdf
23.3 KB · Views: 2

#### DHARANI KANNA K V

##### Member
DATA MANIPULATUION

SFO Public Department

Python:
``````import pandas as pd
import seaborn as sns
import matplotlib as plt
%matplotlib inline
df=pd.DataFrame(salary)

# Total salary cost has increased from year 2011 to 2014

df.info()
#df['Year'].unique()
feature=df[['Year','TotalPay']]
feature

salary_mean = df.groupby('Year').mean()[['TotalPay']]
print(salary_mean)

salary_dif = salary_mean.loc[2014]-salary_mean.loc[2011]
salary_dif

sns.lineplot(data=salary_mean)
emp_mean = df.groupby('Year').max()[['TotalPay','EmployeeName']].reset_index()
emp_mean
sns.barplot(data=emp_mean,x='EmployeeName', y='TotalPay')``````

#### Attachments

• DATA_MANIPULATION.pdf
42.5 KB · Views: 1

Assignments

#### Attachments

• diab.pdf
87.1 KB · Views: 2
• mtcars.pdf
96.7 KB · Views: 2
Last edited:

#### DHARANI KANNA K V

##### Member
LESSON_END_PROJECT
Python:
``````import pandas as pd
import seaborn as sns
import matplotlib as plt
%matplotlib inline

print("Library are now Accessable...")
data = pd.DataFrame({'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'last_name':['Miller', 'Jacobson', ".", 'Milner', 'Cooze'],
'age': [42, 52, 36, 24, 73],
'preTestScore': [4, 24, 31, ".", "."],
'postTestScore': ["25,000", "94,000", 57, 62, 70]})
data

# 1. save dataframe into csv file

data.to_csv("project.csv")
print("Data Exported Sucessfully as 'project.csv' ")

raw=raw.drop(columns = {'Unnamed: 0'}, inplace = False)

print(pd.DataFrame(raw))

print("\nproject.csv printed Sucessfully as 'Dataframe' ")

# Rename columns
raw=raw.rename(columns = {'first_name': 'First Name', 'last_name': 'Last Name'}, inplace = False)
print("Column renamed Sucessfully\n")
print(raw[['First Name','Last Name']])

# finding any missing values
raw.isna()

# Remove first 3 rows[0,1,2]
raw.iloc[3:]``````

#### Attachments

• LESSON_END_PROJECT.pdf
27.4 KB · Views: 6

#### ksrinivasreddy10_1

##### New Member
Problem Statement:
For the Boston dataset used earlier, the team also wants to cross reference results using
regularization techniques

#### T E Mohanasundari

##### Member
Hello sir,

Problem Statement: A real estate company wants to build homes at different locations in Boston. They have data for historical prices but haven’t decided the actual prices yet. They want to price it so that it is affordable to the general public.
Objective: • Import the Boston data from sklearn and read the description using DESCR • Analyze the data and predict the approximate prices for the houses

Solution is attached.

Regards,
Mohana.

#### Attachments

• Boston Reg Assign.pdf
113.6 KB · Views: 8

#### DHARANI KANNA K V

##### New Member
Hello,

Due to some personal reasons, i couldn't work on the assessment. I would like to request for an extension of 2 - 3 more days for submitting my assessment.