# Data Science with R | Shikhar Parashar

Discussion in 'Big Data and Analytics' started by Nishant_Singh, Jul 7, 2019.

Joined:
Aug 1, 2018
Messages:
243
44
#1

Joined:
May 13, 2019
Messages:
2
0

#2
3. ### Vinay Raut Member

Joined:
May 13, 2019
Messages:
2
0
i am stuck here..

#3
4. ### Vishal Shah_5 New Member

Joined:
Jun 25, 2019
Messages:
1
0
Hey Nishant, Whats Electives in Master Data Science course

#4
5. ### Marni Hiteshwar Chowdary Member

Joined:
Jun 27, 2019
Messages:
3
0
type x and press Enter instead of typing the entire expression again. Let me know if it helps.
Thanks,
Hitesh

#5
6. ### Taranpreet Singh_2 New Member

Joined:
Jun 29, 2019
Messages:
1
0
Just type x and press enter

#6
7. ### _60145 Member

Joined:
Mar 22, 2019
Messages:
9
0
h
Hey Vinay, Just type x and click enter
in the previous step x <- 5+7, x holds value of 12, so inorder to see what x holds, just type x and click enter

#7

Joined:
Jun 19, 2019
Messages:
1
0

#8
9. ### _60145 Member

Joined:
Mar 22, 2019
Messages:
9
0
what should we give here to get credit

#9
10. ### _60145 Member

Joined:
Mar 22, 2019
Messages:
9
0
@Shikhar Parashar(4707) when i use seq() function , in the below manner
> seq(5,10, length=30)
i have got the below output
[1] 5.000000 5.172414 5.344828 5.517241 5.689655 5.862069 6.034483 6.206897 6.379310
[10] 6.551724 6.724138 6.896552 7.068966 7.241379 7.413793 7.586207 7.758621 7.931034
[19] 8.103448 8.275862 8.448276 8.620690 8.793103 8.965517 9.137931 9.310345 9.482759
[28] 9.655172 9.827586 10.000000

my question is: what is the logic in generating these 30 numbers. are they random

#10
11. ### Tathagat Kishore Mishra Member

Joined:
Jun 14, 2019
Messages:
12
9
Friend we need to READ & READ and then type the commands or statement as per swirl. This tools seems to be comparing the exact input and output.

#11
12. ### Tathagat Kishore Mishra Member

Joined:
Jun 14, 2019
Messages:
12
9
It seems it's trying to print length=30 number between 5 and 10, inclusive of both the numbers , ie evenly separated.

print("We want to find a sequence of length=30 number between 5.000000 to 10.000000")
print (5.000000)
temp <- 10-5
# 29 because we want to also consider both the start and end numbers
for(x in c(1:29))
{
temp = (temp+(5/29))
print(round(temp,6))
}

Other example is -
> seq(5,10, length=5)
[1] 5.00 6.25 7.50 8.75 10.00

#12
13. ### SUNNY BHAVEEN CHANDRA Well-Known Member

Joined:
Feb 4, 2019
Messages:
83
13
Hi Vishal,

Electives are the optional course that serves as a prerequisite to the main course. But they are optional course and it is for them who are totally new to this field of data science.

Regards,
Sunny
Sr. Teaching assistant

#13
14. ### _60145 Member

Joined:
Mar 22, 2019
Messages:
9
0
I am unable to proceed from here, tried multiple times, exiting swirl but still gets the same error, can someone help me

#14
15. ### _60145 Member

Joined:
Mar 22, 2019
Messages:
9
0
delete # sign in the above window where u see "#x" PFA

#15
16. ### Subhrajit Pyne Member

Joined:
Jun 24, 2019
Messages:
2
0
I can not get the issue here.

#16
17. ### Vikas Kumar_18 Well-Known Member Simplilearn SupportAlumni

Joined:
Dec 17, 2018
Messages:
172
30
It may be the indentation error. Could you please put the last curly braces just after the x. It would work if you assign the values or print function then it would work definitely.

boring_function <- function(x){print(x)}
boring_function(5)

You can check this code also above as reference.

#17

Joined:
Feb 4, 2019
Messages:
83
13
#18
19. ### Tathagat Kishore Mishra Member

Joined:
Jun 14, 2019
Messages:
12
9
Is there a way, I can read .xlsx file and columns are of Factors instead of Characters?
Not sure if there is error in Self Learning contents as well !!!

BankCustomer1 <- read.csv("Demo 1_ Identifying Data Structures.csv")
#View(BankCustomer)
str(BankCustomer1)
BankCustomer <- read_excel("tmishra/inputDataSet/Demo 1_Identifying Data Structures.xlsm", stringsasFactors=TRUE)
str(BankCustomer)

File size:
596 KB
Views:
1
#19

Joined:
Jun 27, 2019
Messages:
3
0
Hi Buddies,

#20
21. ### Tathagat Kishore Mishra Member

Joined:
Jun 14, 2019
Messages:
12
9
Hi Marni,

There are couple of issue here !!!

You are trying to assign multiple values to single index,
It seems d1st seems to be a vector
dist <- c("JAN", "FEB","MAR") so d1st[4] is NA, Hence you are getting above errors,

> dist <- matrix(c("JAN", "FEB","MAR", "APR","MAY","JUN", "JUL","AUG","SEPT","OCT","NOV","DEC"), 4,3)
> dist[4][1] "APR"> dist[4,][1] "APR" "AUG" "DEC"> dist[4] <- c(1,2,3)Warning message:In dist[4] <- c(1, 2, 3) : number of items to replace is not a multiple of replacement length> dist[4,] <- c(1,2,3)> dist[4][1] "1"> dist[4,][1] "1" "2" "3"> dist [,1] [,2] [,3]
[1,] "JAN" "MAY" "SEPT"
[2,] "FEB" "JUN" "OCT"
[3,] "MAR" "JUL" "NOV"
[4,] "1" "2" "3"
>

#21
22. ### Marni Hiteshwar Chowdary Member

Joined:
Jun 27, 2019
Messages:
3
0

#22

Joined:
Jun 27, 2019
Messages:
2
2
Hi,
Why am i seeing this issue when i am trying to read xls file.
Error in findPerl(verbose = verbose) :
perl executable not found. Use perl= argument to specify the correct path.
Error in file.exists(tfn) : invalid 'file' argument
do i need to import perl in my machine?

#23

Joined:
Jun 27, 2019
Messages:
2
2
I was able to fix the issue installing Perl in my machine and executing below command in R shell
perl <-"C:/strawberry/perl/bin/perl.exe"

#24

Joined:
Mar 23, 2018
Messages:
1
0
Hi,

Regards,
Aparna

#25
26. ### Kumar Akash_1 Member

Joined:
Jun 25, 2019
Messages:
3
0
Hi Aparna,

Data Science with R--> Live class-->On the Registered class box you will get DOWNLODED RECORDING-->Click on the Downloaded recording then you will get all recorded session till yesterday.

Regards,
Akash

#26
27. ### _60145 Member

Joined:
Mar 22, 2019
Messages:
9
0
go to live classes tab, u will find teh recordings

#27
Last edited: Jul 22, 2019
28. ### _60145 Member

Joined:
Mar 22, 2019
Messages:
9
0
hi shikhar,

bplot <- barplot(xtabs(~visua_df\$Continent), space = FALSE, main = "Countries", col = rainbow(length(visua_df\$Continent)))

here: visua_df is my dataframe, and
length(visua_df\$Continent) is 6

my question here is, when i have used col = rainbow(6), i have seen different colors to the bars, but when is used
this: col = rainbow(length(visua_df\$Continent), my graph has all the bars set to Red color

#28
29. ### Tathagat Kishore Mishra Member

Joined:
Jun 14, 2019
Messages:
12
9
I also faced this issue when Perl was not installed

#29
30. ### Tathagat Kishore Mishra Member

Joined:
Jun 14, 2019
Messages:
12
9
I am suspecting there is something wrong with visua_df\$Continent values. I can see that col=rainbow(X) depends on values of X, if it's high. it paint more than one bar as RED color.

length(mtcars\$cyl)
#32
unique(mtcars\$cyl)
# [6,4,8]
length(unique(mtcars\$cyl))
# 3

barplot(xtabs(~mtcars\$cyl), space = FALSE, main = "Countries", col = rainbow(32)) # gives all red BAR
barplot(xtabs(~mtcars\$cyl), space = FALSE, main = "Countries", col = rainbow(3)) # gives three distinct color

Can you please share the dataset as .csv or .xls files?

#30
31. ### _60145 Member

Joined:
Mar 22, 2019
Messages:
9
0
PFA.. Dataset file

File size:
997 KB
Views:
4
#31
32. ### Tathagat Kishore Mishra Member

Joined:
Jun 14, 2019
Messages:
12
9
I was guessing correctly-

length(visua_df\$Continent) #There is too much elements to fit in RAINBOW color !!
[1] 32109> unique(visua_df\$Continent)[1] "OC" "N.America" "AS" "EU"
[5] "SA" "AF"

# You need to use length(unique(visua_df\$Continent)) to get 6 color
> bplot <- barplot(xtabs(~visua_df\$Continent), space = FALSE, main = "Countries", col = rainbow(length(unique(visua_df\$Continent))))
>

#32
33. ### _60145 Member

Joined:
Mar 22, 2019
Messages:
9
0

Make sense mate...thanks a lot for your explanation n time...

#33
34. ### Kumar Akash_1 Member

Joined:
Jun 25, 2019
Messages:
3
0

Hi Harni,

Let me understand your query first.

1. Are you looking to replace the 4th component? If Yes then below will be a possible answer for your query

dlst[[4]]<-c(1,2,3)

o/p :

\$months
[1] "JAN,FEB,MAR"
\$matrix
[,1] [,2] [,3]

[1,] 1 3 5

[2,] 2 4 6
\$msc
[1] 1

[[4]]
[1] 1 2 3

2. Are u trying to change the name of the 4th component then below will be a possible answer.Here NewName is the new component name

names(dlst)<-list("month","matrix","msc","NewName")
> dlst
\$month
[1] "JAN","FEB","MAR"

\$matrix
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
\$msc
[1] 1
\$NewName
[1] 1

Please post here if you having any doubt.

Regards,
Akash

#34
35. ### Shikhar Parashar(4707) Member Alumni

Joined:
Feb 13, 2014
Messages:
5
2

Have you tried the stringAsFactor Argument in the read.csv function?

#35

Joined:
Feb 13, 2014
Messages:
5
2
#36

Joined:
Jun 14, 2019
Messages:
12
9

#37
38. ### Tathagat Kishore Mishra Member

Joined:
Jun 14, 2019
Messages:
12
9
You need to reduce the size of project path(c:\abc\cdf\) in case you get below error on executing head(dataset) in .Rmd file-

Error in tempfile(pattern = "_rs_rdf_", tmpdir = outputFolder, fileext = ".rdf") : temporary name too long

#38
39. ### Kumar Akash_1 Member

Joined:
Jun 25, 2019
Messages:
3
0
Hi All,Could anyone help me to understand tuning concept with example and how we will interpret the accuracy of the model .
For example if confusion matrix is showing 82% .what it says with respect to problem statement.

You can take any example to explain this but for your reference i am attaching one data set where problem statement is
"To Study a heart disease data set and to model a classifier for predicting whether a patient is suffering from any heart disease or not."

#39
40. ### Tathagat Kishore Mishra Member

Joined:
Jun 14, 2019
Messages:
12
9
I am starting this thread to reach to a best solution of this assessments-

DESCRIPTION

Background and Objective:
Every year thousands of applications are being submitted by international students for admission in colleges of the USA. It becomes an iterative task for the Education Department to know the total number of applications received and then compare that data with the total number of applications successfully accepted and visas processed. Hence to make the entire process easy, the education department in the US analyze the factors that influence the admission of a student into colleges. The objective of this exercise is to analyse the same.

Domain: Education

Dataset Description:
Attribute Description
Rank It refers to the prestige of the undergraduate institution.
The variable rank takes on the values 1 through 4. Institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest.
SES SES refers to socioeconomic status: 1 - low, 2 - medium, 3 - high.
Gender_male Gender_male (0, 1) = 0 -> Female, 1 -> Male
Race Race – 1, 2, and 3 represent Hispanic, Asian, and African-American

Analysis Tasks: Analyze the historical data and determine the key drivers for admission.

Predictive:

Find the missing values. (if any, perform missing value treatment)
Find outliers (if any, then perform outlier treatment)
Find the structure of the data set and if required, transform the numeric data type to factor and vice-versa.
Find whether the data is normally distributed or not. Use the plot to determine the same.
Normalize the data if not normally distributed.
Use variable reduction techniques to identify significant variables.
Run logistic model to determine the factors that influence the admission process of a student (Drop insignificant variables)
Calculate the accuracy of the model and run validation techniques.
Try other modelling techniques like decision tree and SVM and select a champion model
Determine the accuracy rates for each kind of model
Select the most accurate model
Identify other Machine learning or statistical techniques

Descriptive:
Categorize the average of grade point into High, Medium, and Low (with admission probability percentages) and plot it on a point chart.
Cross grid for admission variables with GRE Categorization is shown below:
GRE Categorized
0-440 Low
440-580 Medium
580+ High

#40
41. ### Subhrajit Pyne Member

Joined:
Jun 24, 2019
Messages:
2
0
Hello All, what are the steps in Factor Analysis? Please help me. I have got stuck at the 3rd problem statement in Internet project.

Find out the probable factors from the dataset, which could affect the exits.
Exit Page Analysis is usually required to get an idea about why a user leaves
the website for a session and moves on to another one. Please keep in
mind that exits should not be confused with bounces

#41
42. ### Tathagat Kishore Mishra Member

Joined:
Jun 14, 2019
Messages:
12
9
My project has been accepted successfully, got certificate of appreciation after completing assessment, Let me know if any one need any help

#42
Last edited: Aug 13, 2019

Joined:
Nov 20, 2015
Messages:
1
0

#43
44. ### Nirmal Chandra Dash Member

Joined:
Nov 22, 2019
Messages:
2
0
Can you please help why stringsAsFactors is not working for excel file. It is absolutely working file for .csv file.

#44
45. ### Nirmal Chandra Dash Member

Joined:
Nov 22, 2019
Messages:
2
0
For excel file while displaying the o/p on screen it's throwing the below error.
> Bank_customer1
Error: package ‘fansi’ was installed by an R version with different internals; it needs to be reinstalled for use with this R version

#45
46. ### Prakash Meghani New Member

Joined:
Jul 19, 2019
Messages:
1
0
Is this is the correct forum group, in which lecturer changed from Shikhar Parashar to Rajib Layek ?

#46