Comparing Two Proportions

Comparing Two Proportions

In the Module Overview, you’ll have noticed the textbook assignment for this module.
ideos are optional, and are there for your reference. Watch the video if you so choose, then complete the practice problems from your textbook.
You’ll notice, I’ve only assigned odd-numbered exercises. This is because I want you to be able to check your work as you go along. Please use  good judgment and academic integrity when you complete the assignments; don’t merely copy or paraphrase the answers–that will earn you a 0–but use them to guide your answers. I recommend you complete the assignment, then check the answers, and then make corrections in a different colored pen/cil or font. This way you really have the opportunity to learn to nuances–and there are nuances–in each problem. Please
make sure to show all your work, step by step, if applicable.
If you have any questions, especially concerning my definition of copying or paraphrasing, please feel free to email me.
Feel free to do the problems by hand and then scan them to upload. I really like Genius Scan. You can also take a picture of your work instead, but I ask you make sure everything is legible before you upload the picture. You may feel more comfortable typing your answers, which is fine; just be sure that ALL STATISTICAL DISPLAYS are included (no matter the format you choose).
Complete bookwork – pg 618; #21*, 23, 27, 29*, 33, 35*, 37*

Measures of Central Tendency Paper

Measures of Central Tendency Paper

The mean salary is often used to describe the salaries of employees of a company. However, the median salary may be a better measure of the salaries in comparison to the mean. Research a career you are interested in and calculate the mean and median salaries using at least ten data points. Include the calculations and the data source(s). Which is the better measure of central tendency? Why? Review and respond to the comments posted by your peers and offer your insight on this topic. Do you agree or disagree with their selection? Why or why not?

Check this good simple and easy to use resource for making box plots: http://www.shodor.org/interactivate/activities/BoxPlot/
This is a good place for an example : http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/Descriptive.htm
If you have an even number of data points, then you average the 5th and 6th elements (when placed in numerical order) to obtain the median.
The median divides the original data set into equal sized halves; repeat the procedure with these new smaller data sets to find the first and third
quartiles.

Pay attention to whether these new smaller sets have an even or odd number of elements!
This post seems to focus on the mean and median, but there is another measure of central tendency – the mode.
In what situations and what variables would the mode be the best choice as the measure of central tendency?
This side question has nothing to do with salaries, they are continuous quantitative variables.
(Needs to be at least 150 words)

Tools for Data Analysis

Tools for Data Analysis

To complete the Assignment, compose a cohesive document that addresses the following:
Create a table outlining practical applications for each tool discuss in “The Seven Quality Tools” (Stauffer, 2013). Include the following within your table:
Strengths: Why that tool works well for those applications
Tips for use Cautions relevant to the tool
Analyze the effectiveness of each tool listed within your table. In your analysis, address the following:
Choose an online example, or an example from your experience, in which the tool was used. Provide a link to your example. Analyze how it was used within the organization. For each tool listed, find an online example where the tool was used properly and provide the link to your example and a brief description of how it was used and your analysis of its effectiveness or whether there was a better tool and why.

Types of Reliability and Validity

Types of Reliability and Validity

Investigate an individual, standardized cognitive or academic assessment like the WISC, WJ, KTEA or WIAT and discuss the concepts listed below that you are able to find in the technical manual of the assessment:

  • Test-Retest Reliability
  • Interrater Reliability
  • Internal Consistency
  • Confidence Intervals
  • Standard Error Measurement
  • Face Validity
  • Construct Validity
  • Criterion-Related Validity
  • Content Validity
  • External Validity

Index Construction and Use SPSS

Homework 5: Index Construction & Use Comments
This assignment continues a series of labs and homeworks in which you utilize statistical skills for basic
research. For this assignment, you will again manipulate variables and construct a basic index, as you have
done in several earlier assignments. However, for this assignment, you will take an additional step, using the
index that you create for simple bivariate descriptions of the sample. The index will be the “dependent variable”:
Specifically, you will use ordinal measures to compare support for possible explanations of variation in the index.
Instructions
You will be using the data file hw5.sav to examine variation in respondents’ satisfaction with four areas of their
lives (family, friends, finance, and job). You will then create a summary measure of overall satisfaction, and will
explore how (and whether) that summary measure varies in two ways: across educational levels and with
frequency of sexual activity. Finally, you will briefly explore interactions among these possible influences on
satisfaction. (Note that most of the recoding has been done for you – this is not always the case.)
Requirements & Questions
You must submit your output file (complete but cleaned) and typed answers to these questions. Typed. Probably
with a computer, maybe with some other device, possibly a typewriter. But not a pen, pencil, or crayon. Typed.
1. Univariate analyses of component and independent variables:
• Perform a univariate analysis of SATFAM, SATFIN, SATJOB, and SATFRND – For each, you should
look at and briefly summarize the frequency distribution, as well as basic summary statistics for central
tendency and dispersion. Go beyond just reporting the data and say something interesting (here and
below). For example, about which issues are the respondents the most/least happy?
• Look briefly at the distributions of EDUC and SEXFREQ. (Note, in particular, the percent of the sample
who refused to answer or otherwise did not have an answer for SEXFREQ.)
2. Construct and assess index:
• Construct an index (including variable labels and value labels, at least for the extremes), called
HAPPY, as the summation of values for the four components listed above.
• Perform a univariate analysis of HAPPY – look at and briefly summarize the frequency distribution, as
well as basic summary statistics for central tendency and dispersion..
• What is this variable conceptually? What does it measure, and what does it mean? What does it tell us
that the individual components do not?
• Interpret the “alpha” for your index – is the index reliable? is it a good one? why or why not?
3. Bivariate analyses – what makes people happy?
• Using correlations and chi-square, what can you say about the relationship between educational
attainment and overall satisfaction (i.e. between HAPPY and EDUC)? (You will need to request a
crosstab to get chisquare, but ignore the table itself, for now.) Is it strong? statistically significant?
• Using correlations and chi-square, what can you say about the relationship between frequency of
sexual activity and overall satisfaction (i.e. between HAPPY and SEXFREQ)? (You will need to request
a crosstab to get chisquare, but ignore the table itself, for now.) Is it strong? statistically significant?
4. Discussion/conclusions
• What can you infer from these findings about what makes people happy? (Hint: Did either of the two
independent variables (EDUC and SEXFREQ) have a statistically significant effect on the dependent
variable (HAPPY)?)
• Bonus: Put that at a conceptual level, thinking about what broader concepts these variables might
operationalize. Of what larger concept might education be a specific instance, indicator, or aspect?
What about sexual frequency?

Merger and acquisition engagement of environmental innovators in the automotive industry Software: STATA

Master Level, Use of STATA, Orbis and Zephyr, has to contain patents data, merger and acquisition data, Please use all 3 databases. The paper also requires statistical models such as a regression for example, 4000 words, 15 pages, times new roman 12, i also added my slides and example studies. The assessmentform is also in there which is very important. Please read them to get a picture of the required level

Marketing Research – Quantitative Data Analysis

Topic: Marketing Research – Quantitative Data Analysis

A series of (5) separate Homework Assignments requiring the following: – the correct input of data into the SPSS software and evidence of this process in the form of output in a PDF download. – Summarize, organize and present a summary of your statistical output into easily understandable table(s) on one page – presenting the information asked for in the objectives above, and highlight what you think is necessary for your clients to know in the most easily readable manner.

Sample T – test. Student t tests

Sample T – test. Student t tests

Pick any set of data. Conduct a two-sample T-test.
Explain in the discussion question:
Your source of data
Your null hypothesis
Whether or not you reject the null hypothesis (and include the P value)
How this information might be relevant to a decision maker.
Attach the Excel file containing the data source (but be sure everything we need to know about
your executive summary is in the body of the discussion forum, not the attachment).

COVID19 Correlation Analysis

COVID19 Correlation Analysis 

i will upload the file here with the information.
questions below should be answered.
i will also provide links to help answer the questions.
https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-cases-by-zip-04152020-1.pdf
https://state.1keydata.com/state-population-density.php
the questions are:
1. See the graphs below, with log-transformed population density on the X-axis and logtransformed CDR on the Y-axis with a least squares line fit to the data.
Is the correlation positive or negative?

2. As the X-axis (population density) increases, what happens to the Y-axis (Crude Death Rate)?
3. What does the 0.449 mean? This is a log-log regression equation, so it has an easy
interpretation: For every 1% increase in population density, there is a 0.5% increase in the Crude
Death Rate for COVID19
4. Given what we just figured out, which states should emphasize social distancing the most?
(Hint: look at the graph with state name labels).
5. We have tested only one variable…what other variables do you think we should test?

Statistics Assignment

Statistics Assignment

Data:
For this assignment, please download the Homework data file.
Requirements:
1. Create a line chart and identify the time series components in the time series. Then, compute
the correlation between the time series variable and time using either the =CORREL() function or
the correlation tool in the date analysis toolpack. Justify your answer. (Hint: it should be some
combination of average or base, cycle, trend, and random variation.)
2. Create as many forecast as possible on the historical data using each of the methods below.
a. 3-period moving average.
c. Exponential smoothing forecast with alpha = 0.8.
d. Trend forecast (whether or not there is a trend). Use the =TREND() function in Excel.

IMPORTANT NOTES: When computing your moving average forecasts, do not use the Moving
Average Data Analysis tool. This tool will not give you a valid forecast because it uses the
current period in the computation. Instead, use the AVERAGE() function. Also, for both the ES
and the MA forecasts, do not include the period you are forecasting in the history you are using
to compute the forecast.
3. Starting with the fourth period, compute the MAE for each forecasting model, and choose the
best model based on this analysis.
4. Using the best model, make a new forecast for the next period.
Deliverables
Please place all of your analysis on a single spreadsheet. Clearly label your answers. When you
have completed the assignment, post your Excel file on the HW 4 assignment dropbox.
Hints: In this assignment, you are using your entire history to build good models and to test the
forecasting skill of the models. Once you have computed the forecasts and calculated the MAEs
for each model, you will choose the most accurate model on historical data to make a future
forecast. The moving average forecasts will begin at period 4, the ES forecasts will begin at
period 2 (with the naïve starting value), and the trend forecasts will begin at period 1. For
consistency, you should compute the MAEs for periods 3-98 for each. the first draft is by
Wednesday

Date Total Instances of Fraud
10/29/2019 428
10/30/2019 314
10/31/2019 429
11/1/2019 474
11/2/2019 443
11/3/2019 462
11/4/2019 361
11/5/2019 458
11/6/2019 410
11/7/2019 595
11/8/2019 396
11/9/2019 511
11/10/2019 508
11/11/2019 447
11/12/2019 463
11/13/2019 321
11/14/2019 628
11/15/2019 340
11/16/2019 363
11/17/2019 438
11/18/2019 369
11/19/2019 430
11/20/2019 338
11/21/2019 637
11/22/2019 352
11/23/2019 468
11/24/2019 366
11/25/2019 440
11/26/2019 343
11/27/2019 504
11/28/2019 657
11/29/2019 514
11/30/2019 343
12/1/2019 458
12/2/2019 484
12/3/2019 428
12/4/2019 456
12/5/2019 609
12/6/2019 493
12/7/2019 477
12/8/2019 442
12/9/2019 457
12/10/2019 369
12/11/2019 459
12/12/2019 674
12/13/2019 378
12/14/2019 394
12/15/2019 408
12/16/2019 385
12/17/2019 511
12/18/2019 353
12/19/2019 619
12/20/2019 480
12/21/2019 529
12/22/2019 509
12/23/2019 388
12/24/2019 359
12/25/2019 430
12/26/2019 610
12/27/2019 439
12/28/2019 480
12/29/2019 378
12/30/2019 446
12/31/2019 438
1/1/2020 484
1/2/2020 625
1/3/2020 446
1/4/2020 533
1/5/2020 413
1/6/2020 469
1/7/2020 534
1/8/2020 516
1/9/2020 577
1/10/2020 493
1/11/2020 525
1/12/2020 397
1/13/2020 533
1/14/2020 420
1/15/2020 426
1/16/2020 569
1/17/2020 417
1/18/2020 453
1/19/2020 427
1/20/2020 458
1/21/2020 455
1/22/2020 559
1/23/2020 652
1/24/2020 414
1/25/2020 426
1/26/2020 426
1/27/2020 582
1/28/2020 471
1/29/2020 569
1/30/2020 631
1/31/2020 484
2/1/2020 549
2/2/2020 408

Comparing Global Values and Attitudes

PROJECT 3: Comparing Global Values and Attitudes

An independent-samples hypothesis test helps us determine if two groups (for example, cats and dogs) substantively differ with respect to a social value as measured by an interval-ratio variable (for example, feelings about lasagna measured on a scale from 1 to 10). For this project, you will be asked to prepare a report that tells us how two groups (for example, the US and Spain) differ with respect to one social value variable related to a UN SDG of your choice. Stated differently, you’ll be comparing local and global data and relating it back to the Sustainable Development Goals set forth by the United Nations.

Statistical Analysis of Data Using MINITAB

Statistical Analysis of Data Using MINITAB
Deadline: 5pm Monday 16th October 2017

Introduction and dataset
The aim of this coursework is to investigate and predict the onset of diabetes based on
various diagnostic measurements.

The dataset was originally compiled by researcher at the Johns Hopkins University
School of Medicine, from a larger database owned by the National Institute of Diabetes
and Digestive and Kidney Diseases. All patients were females at least 21 years old of
Pima Indian heritage. Note that Pima Indians have one of the highest rates of diabetes
in the world.

This dataset includes 392 observations, taken at the individual level and available from
diabetes_dataset.xlsx file in Statistical Data Analysis Coursework folder on NOW.
The key indicator of diabetes (response variable), as defined by the World Health
Organization, is a plasma glucose concentration greater than 200 mg/dl two hours
following ingestion of a 75 gm carbohydrate solution (variable Glucose).

The  explanatory variables (or predictors) are known risk factors for diabetes: number of
pregnancies, diastolic blood pressure, triceps skinfold thickness (an indicator of
bodyfat), 2 hour serum insulin, body mass index, age, and diabetes pedigree function
(see Table).

Table. Measurements recorded in the dataset,
Measurement/variables Description
Glucose plasma glucose concentration 2 hours in an
oral glucose tolerance test
Pregnancies number of times pregnant
BloodPressure diastolic blood pressure (mm Hg)
SkinThickness triceps skin fold thickness (mm)
Insulin 2-Hour serum insulin (mu U/ml)
BMI body mass index (weight in kg/(height in m)2
)
DiabetesPedigreeFunction diabetes pedigree function*
Age age (years)
Outcome class variable (0 or 1)**
* a synthesis of diabetes history in an individual’s relatives
**negative (0)/positive (1) diabetes test
Creating your unique dataset
Copy the data from this file into MINITAB so that Glucose is recorded in column C1,
Pregnancies in C2, etc.

(1) Generate two random numbers between 2 and 7 and provide MINITAB output.
(1 mark)

(2) Using MINITAB, erase columns corresponding to your generated numbers (e.g. if
one of the generated numbers is 5 then erase column C5, etc). Describe how you did
this and provide the sequence of actions (e.g. Calc->Descriptive Stats->….)
(2 mark)

(3) Using MINITAB select a random sample of 300 observations (n = 300) from your
dataset. Provide the sequence of actions of how you did this.
(1 mark)
Your unique dataset will now consist of 300 rows and seven columns including
Glucose, Age and Outcome.
Investigating your unique dataset

(4) For your unique dataset summarise information about your observations and present
graphically the frequency distributions for all variables that are left in your unique
dataset including Glucose but excluding Outcome variables. Comment on unusual
observations and make your own decision, how to deal with them.
(6 marks)

(5) Using MINITAB, define a new variable, Age_Group, by combining observations
for participants younger than 30 into group 1 and all others (of age 30 and older) into
group 2. Provide either a description or a screen shot of how you did this.
(3 marks)

(6) Investigate whether there is a significant difference in mean/median Glucose
concentration between age groups. Formulate the null and alternative hypotheses;
choose, justify and perform an appropriate statistical test using MINITAB; provide all
MINITAB outputs; write your conclusions.
(10 marks)

(7) Show whether the proportion of participants with Glucose concentration greater
than 100 mg/dl is different between age groups that you defined previously. Formulate
the null and alternative hypotheses; choose, justify and perform an appropriate
statistical test using MINITAB; provide all MINITAB outputs; write your conclusions.
(10 marks)

(8) Using MINITAB, produce a table of correlation coefficients. Justify the choice of
correlation coefficient, investigate the resulting table and comment on most interesting
relationships between chosen variables. Do not use Glucose and Outcome variables in
this analysis.
(4 marks)

(9) Using simple linear regression, model Glucose concentration by one of the
variables of your choice that are available in your unique dataset. Comment on
significance of intercept and slope.
(4 marks)

(10) Fit a multiple regression model with Glucose being a response variable and other
five variables excluding Outcome as predictors. Treat variable Pregnancies as an
interval scale data. Identify insignificant predictors in the model and explain why they
are insignificant.
(4 marks)

(11) Cluster your 300 observation into 10 groups using one of the linkage method and
similarity measure from the corresponding drop-down menus. Give a brief (half a page)
description of the linkage method and similarity measure chosen. Show a dendrogram
with cases labelled by Outcome. Comment on the results obtained. Provide all
MINITAB outputs.
(6 marks)

(12) It is known that the incidence of diabetes in the UK is 0.6. In a small northern
village of 100 people isolated from the mainland for six months per year the pharmacy
wants to know how many insulin shots to order. We want to know what is the
probability that between A and B people will develop the disease during this period. To
perform analysis, generate two random numbers between 0 and 100 using MINITAB
and paste the outputs into your report. Denote by A the smallest number and by B the
largest number out of these two generated numbers. Calculate the probability that
between A and B people develop the disease and how many shots should be ordered.
(9 marks)

Glucose Pregnancies BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
56 2 56 28 45 24.2 0.332 22 0
68 2 62 13 15 20.1 0.257 23 0
68 2 70 32 66 25 0.187 25 0
68 10 106 23 49 35.5 0.285 47 0
71 1 48 18 76 20.4 0.323 22 0
71 1 78 50 45 33.2 0.422 21 0
74 0 52 10 36 27.8 0.269 22 0
74 3 68 28 45 29.7 0.293 23 0
74 8 70 40 49 35.3 0.705 39 0
75 2 64 24 55 29.7 0.37 33 0
77 1 56 30 56 33.3 1.251 24 0
77 5 82 41 42 35.8 0.156 35 0
78 3 50 32 88 31 0.248 26 1
78 0 88 29 40 36.9 0.434 21 0
79 1 80 25 37 25.4 0.583 22 0
79 1 60 42 48 43.5 0.678 23 0
80 1 74 11 60 30 0.527 22 0
80 3 82 31 70 34.2 1.292 27 1
81 1 72 18 40 26.6 0.283 24 0
81 3 86 16 66 27.5 0.306 22 0
81 2 72 15 76 30.1 0.547 25 0
81 1 74 41 57 46.3 1.096 32 0
81 7 78 40 48 46.7 0.261 42 0
82 1 64 13 95 21.2 0.415 23 0
82 2 52 22 115 28.5 1.699 25 0
83 7 78 26 71 29.3 0.767 36 0
83 2 66 23 50 32.2 0.497 22 0
83 3 58 31 18 34.3 0.336 25 0
83 2 65 28 66 36.8 0.629 24 0
84 2 50 23 76 30.4 0.968 21 0
84 3 68 30 106 31.9 0.591 25 0
84 0 64 22 66 35.8 0.545 21 0
84 1 64 23 115 36.9 0.471 28 0
84 0 82 31 125 38.2 0.233 23 0
84 4 90 23 56 39.5 0.159 25 0
85 4 58 22 49 27.8 0.306 28 0
86 5 68 28 71 30.2 0.364 24 0
86 1 66 52 65 41.3 0.917 29 0
87 2 58 16 52 32.7 0.166 25 0
87 1 78 27 32 34.6 0.101 22 0
87 1 60 37 75 37.2 0.509 22 0
87 1 68 34 77 37.6 0.401 24 0
88 5 66 21 23 24.4 0.342 30 0
88 3 58 11 54 24.8 0.267 22 0
88 2 58 26 16 28.4 0.766 22 0
88 2 74 19 53 29 0.229 22 0
88 1 62 24 44 29.9 0.422 23 0
88 1 78 29 76 32 0.365 29 0
88 12 74 40 54 35.3 0.378 48 0
88 1 30 42 99 55 0.496 26 1
89 1 24 19 25 27.8 0.559 21 0
89 1 66 23 94 28.1 0.167 21 0
89 3 74 16 85 30.4 0.551 38 0
89 1 76 34 37 31.2 0.192 23 0
90 2 80 14 55 24.4 0.249 24 0
90 1 62 18 59 25.1 1.268 25 0
90 1 62 12 43 27.2 0.58 24 0
90 4 88 47 54 37.7 0.362 29 0
91 1 54 25 100 25.2 0.234 23 0
91 4 70 32 88 33.1 0.446 22 0
91 0 68 32 210 39.9 0.381 25 0
92 1 62 25 41 19.5 0.482 25 0
92 12 62 7 258 27.6 0.926 44 1
92 6 62 32 126 32 0.085 46 0
93 0 60 25 92 28.7 0.532 22 0
93 6 50 30 64 28.7 0.356 23 0
93 2 64 32 160 38 0.674 23 1
93 0 100 39 72 43.4 1.021 35 0
94 2 68 18 76 26 0.561 21 0
94 2 76 18 66 31.6 0.649 23 0
94 7 64 25 79 33.3 0.738 41 0
94 0 70 27 115 43.5 0.347 21 0
95 1 66 13 38 19.6 0.334 25 0
95 1 60 18 58 23.9 0.26 22 0
95 1 74 21 73 25.9 0.673 36 0
95 2 54 14 88 26.1 0.748 22 0
95 1 82 25 180 35 0.233 43 1
95 0 80 45 92 36.5 0.33 26 0
95 0 85 25 36 37.4 0.247 24 1
95 0 64 39 105 44.6 0.366 22 0
96 4 56 17 49 20.8 0.34 26 0
96 2 68 13 49 21.1 0.647 26 0
96 3 56 34 115 24.7 0.944 39 0
96 1 64 27 87 33.2 0.289 21 0
96 5 74 18 67 33.6 0.997 43 0
97 1 64 19 82 18.2 0.299 21 0
97 1 66 15 140 23.2 0.487 22 0
97 0 64 36 100 36.8 0.6 25 0
97 7 76 32 91 40.9 0.871 32 1
98 0 82 15 84 25.2 0.299 22 0
98 6 58 33 190 34 0.43 43 0
98 2 60 17 120 34.7 0.198 22 0
99 3 80 11 64 19.3 0.284 30 0
99 2 70 16 44 20.4 0.235 27 0
99 3 62 19 74 21.8 0.279 26 0
99 4 76 15 51 23.2 0.223 21 0
99 2 52 15 94 24.6 0.637 21 0
99 3 54 19 86 25.6 0.154 24 0
99 6 60 19 54 26.9 0.497 32 0
99 5 54 28 83 34 0.499 30 0
99 2 60 17 160 36.6 0.453 21 0
99 1 72 30 18 38.6 0.412 21 0
100 1 74 12 46 19.5 0.149 28 0
100 1 66 15 56 23.6 0.666 26 0
100 1 72 12 70 25.3 0.658 28 0
100 12 84 33 105 30 0.488 46 0
100 0 70 26 50 30.8 0.597 21 0
100 3 68 23 81 31.6 0.949 28 0
100 1 66 29 196 32 0.444 42 0
100 2 66 20 90 32.9 0.867 28 1
100 14 78 25 184 36.6 0.412 46 1
100 2 54 28 105 37.8 0.498 24 0
100 2 68 25 71 38.5 0.324 26 0
100 8 74 40 215 39.4 0.661 43 1
100 2 70 52 57 40.5 0.677 25 0
100 0 88 60 110 46.8 0.962 31 0
101 2 58 35 90 21.8 0.155 22 0
101 2 58 17 265 24.2 0.614 23 0
101 1 50 15 36 24.2 0.526 26 0
101 10 76 48 180 32.9 0.171 63 0
102 0 86 17 105 29.3 0.695 27 0
102 3 44 20 94 30.8 0.4 26 0
102 0 78 40 90 34.5 0.238 24 0
102 7 74 40 105 37.2 0.204 45 0
102 0 64 46 78 40.6 0.496 21 0
102 2 86 36 120 45.5 0.127 23 1
103 1 80 11 82 19.4 0.491 22 0
103 4 60 33 192 24 0.966 33 0
103 3 72 30 152 27.6 0.73 27 0
103 6 72 32 190 37.7 0.324 55 0
103 1 30 38 83 43.3 0.183 33 0
104 0 64 23 116 27.8 0.454 23 0
104 6 74 18 156 29.9 0.722 41 1
104 0 64 37 64 33.6 0.51 22 1
105 6 70 32 68 30.8 0.122 37 0
105 2 80 45 191 33.7 0.711 29 1
105 2 58 40 94 34.9 0.225 25 0
105 5 72 29 325 36.9 0.159 28 0
105 0 64 41 142 41.5 0.173 22 0
106 2 56 27 165 29 0.426 22 0
106 2 64 35 119 30.5 1.4 34 0
106 3 54 21 158 30.9 0.292 24 0
106 1 70 28 135 34.2 0.142 22 0
106 0 70 37 148 39.4 0.605 22 0
107 3 62 13 48 22.9 0.678 23 1
107 1 72 30 82 30.8 0.821 24 0
107 2 74 30 100 33.6 0.404 23 0
107 0 62 30 74 36.6 0.757 25 1
108 6 44 20 130 24 0.813 35 0
108 2 62 32 56 25.2 0.128 21 0
108 2 62 10 278 25.3 0.881 22 0
108 2 52 26 63 32.5 0.318 22 0
108 1 60 46 178 35.5 0.415 24 0
108 5 72 43 75 36.1 0.263 33 0
109 1 38 18 120 23.1 0.407 26 0
109 1 56 21 135 25.2 0.833 23 0
109 1 60 8 182 25.4 0.947 21 0
109 8 76 39 114 27.9 0.64 31 1
109 1 58 18 116 28.5 0.219 22 0
109 4 64 44 99 34.8 0.905 26 1
109 5 62 41 129 35.8 0.514 25 1
110 4 76 20 100 28.4 0.118 27 0
110 2 74 29 125 32.4 0.698 27 0
111 1 62 13 182 24 0.138 23 0
111 3 90 12 78 28.4 0.495 29 0
111 3 58 31 44 29.5 0.43 22 0
111 4 72 47 207 37.1 1.39 56 1
112 2 68 22 94 34.1 0.315 26 0
112 9 82 32 175 34.2 0.26 36 1
112 1 72 30 176 34.4 0.528 25 0
112 1 80 45 132 34.8 0.217 24 0
112 2 86 42 160 38.4 0.246 28 0
112 2 78 50 140 39.4 0.175 24 0
113 3 50 10 85 29.5 0.626 25 0
114 7 76 17 110 23.8 0.466 31 0
114 1 66 36 200 38.1 0.289 21 0
114 0 80 34 285 44.2 0.167 27 0
115 1 70 30 96 34.6 0.529 32 1
115 3 66 39 140 38.1 0.15 28 0
116 4 72 12 87 22.1 0.463 37 0
116 3 74 15 105 26.3 0.107 24 0
116 1 78 29 180 36.1 0.496 25 0
117 2 90 19 71 25.2 0.313 21 0
117 0 66 31 188 30.8 0.493 22 0
117 4 64 27 120 33.2 0.23 24 0
117 1 60 23 106 33.8 0.466 27 0
117 1 88 24 145 34.5 0.403 40 1
117 5 86 30 105 39.1 0.251 42 0
117 0 80 31 53 45.2 0.089 24 0
118 1 58 36 94 33.3 0.261 23 0
118 0 84 47 230 45.8 0.551 31 1
119 1 54 13 50 22.3 0.205 24 0
119 6 50 22 176 27.1 1.318 33 1
119 0 64 18 92 34.9 0.725 23 0
119 1 44 47 63 35.5 0.28 25 0
119 1 88 41 170 45.3 0.507 26 0
119 1 86 39 220 45.6 0.808 29 1
120 9 72 22 56 20.8 0.733 48 0
120 0 74 18 63 30.5 0.285 26 0
120 1 80 48 200 38.9 1.162 41 0
120 2 76 37 105 39.7 0.215 29 0
120 11 80 37 150 42.3 0.785 48 1
120 3 70 30 135 42.9 0.452 30 0
121 5 72 23 112 26.2 0.245 30 0
121 0 66 30 165 34.3 0.203 33 1
121 1 78 39 74 39 0.261 28 0
121 2 70 32 95 39.1 0.886 23 0
122 2 60 18 106 29.8 0.717 22 0
122 1 64 32 156 35.1 0.692 30 1
122 2 76 27 200 35.9 0.483 26 0
122 2 52 43 158 36.2 0.816 28 0
122 1 90 51 220 49.7 0.325 31 1
123 4 80 15 176 32 0.443 34 0
123 9 70 44 94 33.1 0.374 40 0
123 6 72 45 230 33.6 0.733 34 0
123 5 74 40 77 34.1 0.269 28 0
123 2 48 32 165 42.1 0.52 26 0
123 3 100 35 240 57.3 0.88 22 0
124 0 56 13 105 21.8 0.452 21 0
124 7 70 33 215 25.5 0.161 37 0
124 8 76 24 600 28.7 0.687 52 1
124 2 68 28 205 32.9 0.875 30 1
124 3 80 33 130 33.2 0.305 26 0
124 9 70 33 402 35.4 0.282 34 0
125 1 70 24 110 24.3 0.221 25 0
125 4 70 18 122 28.9 1.144 45 1
125 6 68 30 120 30 0.464 32 0
125 10 70 26 115 31.1 0.205 41 1
125 1 50 40 167 33.3 0.962 28 1
125 2 60 20 140 33.8 0.088 31 0
126 8 74 38 75 25.9 0.162 39 0
126 0 86 27 120 27.4 0.515 21 0
126 1 56 29 152 28.7 0.801 21 0
126 5 78 27 22 29.6 0.439 40 0
126 0 84 29 215 30.7 0.52 24 0
126 8 88 36 108 38.5 0.349 49 0
126 3 88 41 235 39.3 0.704 27 0
127 2 58 24 275 27.7 1.6 25 0
127 2 46 21 335 34.4 0.176 22 0
127 4 88 11 155 34.5 0.598 28 0
127 0 80 37 210 36.3 0.804 23 0
128 1 82 17 183 27.5 0.115 22 0
128 0 68 19 180 30.5 1.391 25 1
128 1 98 41 58 32 1.321 33 1
128 3 72 25 190 32.4 0.549 27 1
128 1 88 39 110 36.5 1.057 37 1
128 1 48 45 194 40.5 0.613 24 1
128 2 78 37 182 43.3 1.224 31 1
129 6 90 7 326 19.6 0.582 60 0
129 3 64 29 115 26.4 0.219 28 1
129 4 60 12 231 27.5 0.527 31 0
129 2 74 26 205 33.2 0.591 25 0
129 4 86 20 270 35.1 0.231 23 0
129 10 76 28 122 35.9 0.28 39 0
129 3 92 49 155 36.4 0.968 32 1
129 7 68 49 125 38.5 0.439 43 1
129 0 110 46 130 67.1 0.319 26 1
130 1 70 13 105 25.9 0.472 22 0
130 3 78 23 79 28.4 0.323 34 1
130 1 60 23 170 28.6 0.692 21 0
131 1 64 14 415 23.7 0.389 21 0
131 4 68 21 166 33.1 0.16 28 0
133 7 88 15 155 32.4 0.262 37 0
133 1 102 28 140 32.8 0.234 45 1
134 9 74 33 60 25.9 0.46 81 0
134 0 58 20 291 26.4 0.352 21 0
134 6 70 23 130 35.4 0.542 29 1
134 6 80 37 370 46.2 0.238 46 1
135 0 94 46 145 40.6 0.284 26 0
135 0 68 42 250 42.3 0.365 24 1
136 7 74 26 135 26 0.647 51 0
136 11 84 35 130 28.3 0.26 42 1
136 5 84 41 88 35 0.286 35 1
136 15 70 32 110 37.1 0.153 43 1
136 1 74 50 204 37.4 0.399 24 0
137 0 68 14 148 24.8 0.143 21 0
137 0 40 35 168 43.1 2.288 33 1
138 0 60 35 167 34.6 0.534 21 1
138 11 74 26 144 36.1 0.557 50 1
139 0 62 17 210 22.1 0.207 21 0
139 5 64 35 140 28.6 0.411 26 0
139 1 46 19 83 28.7 0.654 22 0
139 5 80 35 160 31.6 0.361 25 1
139 1 62 41 480 40.7 0.536 21 0
140 1 74 26 180 24.1 0.828 23 0
140 12 82 43 325 39.2 0.528 58 1
140 0 65 26 130 42.6 0.431 24 1
141 2 58 34 128 25.4 0.699 24 0
142 2 82 18 64 24.7 0.761 21 0
142 7 60 33 190 28.8 0.687 61 0
142 7 90 24 480 30.4 0.128 43 1
143 1 74 22 61 26.2 0.256 21 0
143 1 86 30 330 30.1 0.892 23 0
143 11 94 33 146 36.6 0.254 51 1
143 1 84 23 310 42.4 1.076 22 0
144 4 58 28 140 29.5 0.287 37 0
144 2 58 33 135 31.6 0.422 25 1
144 5 82 26 285 32 0.452 58 1
144 6 72 27 228 33.9 0.255 40 0
144 1 82 46 180 46.1 0.335 46 1
145 13 82 19 110 22.2 0.245 57 0
145 9 88 34 165 30.3 0.771 53 1
145 9 80 46 130 37.9 0.637 40 1
146 2 70 38 360 28 0.337 29 1
146 4 85 27 100 28.9 0.189 27 0
146 2 76 35 194 38.2 0.329 29 0
147 4 74 25 293 34.9 0.385 30 0
148 4 60 27 318 30.9 0.15 29 1
148 10 84 48 237 37.6 1.001 51 1
149 1 68 29 127 29.3 0.349 42 1
150 7 66 42 342 34.7 0.718 42 0
150 7 78 29 126 35.2 0.692 54 1
151 6 62 31 120 35.5 0.692 28 0
151 12 70 40 271 41.8 0.742 38 1
151 8 78 32 210 42.9 0.516 36 1
152 13 90 33 29 26.8 0.731 43 1
152 9 78 34 171 34.2 0.893 33 1
152 0 82 39 272 41.5 0.27 27 0
153 1 82 42 485 40.6 0.687 23 0
153 13 88 37 140 40.6 1.174 39 0
154 6 74 32 193 29.3 0.839 39 0
154 9 78 30 100 30.9 0.164 45 0
154 4 72 29 126 31.3 0.338 37 0
154 4 62 31 284 32.8 0.237 23 0
154 6 78 41 140 46.1 0.571 27 0
155 2 74 17 96 26.6 0.433 27 1
155 11 76 28 150 33.3 1.353 51 1
155 8 62 26 495 34 0.543 46 1
155 2 52 27 540 38.7 0.24 25 1
155 5 84 44 545 38.7 0.619 34 0
156 9 86 28 155 34.3 1.189 42 1
157 1 72 21 168 25.6 0.123 24 0
157 2 74 35 440 39.4 0.134 30 0
158 3 64 13 387 31.2 0.295 24 0
158 3 76 36 245 31.6 0.851 28 1
158 3 70 30 328 35.5 0.344 35 1
158 5 84 41 210 39.4 0.395 29 1
160 7 54 32 175 30.5 0.588 39 1
161 10 68 23 132 25.5 0.326 47 1
162 0 76 56 100 53.2 0.759 25 1
163 3 70 18 105 31.6 0.268 28 1
163 17 72 41 114 40.9 0.817 47 1
164 1 82 43 67 32.8 0.341 50 0
165 6 68 26 168 33.6 0.631 49 0
165 0 76 43 255 47.9 0.259 26 0
165 0 90 33 680 52.3 0.427 23 0
166 5 72 19 175 25.8 0.587 51 1
167 1 74 17 144 23.4 0.447 33 1
167 8 106 46 231 37.6 0.165 43 1
168 7 88 42 321 38.2 0.787 40 1
169 3 74 19 125 29.9 0.268 31 1
170 3 64 37 225 34.5 0.356 30 1
171 3 72 33 135 33.3 0.199 24 1
171 9 110 24 240 45.4 0.721 54 1
172 1 68 49 579 42.4 0.702 28 1
173 4 70 14 168 29.7 0.361 33 1
173 3 78 39 185 33.8 0.97 31 1
173 3 84 33 474 35.7 0.258 22 1
173 3 82 48 465 38.4 2.137 25 1
173 0 78 32 265 46.5 1.159 58 0
174 3 58 22 194 32.9 0.593 36 1
174 2 88 37 120 44.5 0.646 24 1
176 3 86 27 156 33.3 1.154 52 1
176 8 90 34 300 33.7 0.467 58 1
177 0 60 29 478 34.6 1.072 21 1
179 8 72 42 130 32.7 0.719 36 1
179 0 50 36 159 37.8 0.455 22 1
180 3 64 25 70 34 0.271 26 0
180 0 90 26 90 36.5 0.314 35 1
180 0 78 63 14 59.4 2.42 25 1
181 8 68 36 495 30.1 0.615 60 1
181 1 64 30 180 34.1 0.328 38 1
181 7 84 21 192 35.9 0.586 51 1
181 1 78 42 293 40 1.258 22 1
181 0 88 44 510 43.3 0.222 26 1
184 4 78 39 277 37 0.264 31 1
186 8 90 35 225 34.5 0.423 37 1
187 7 50 33 392 33.9 0.826 34 1
187 3 70 22 200 36.4 0.408 36 1
187 7 68 39 304 37.7 0.254 41 1
187 5 76 27 207 43.6 1.034 53 1
188 0 82 14 185 32 0.682 22 1
189 1 60 23 846 30.1 0.398 59 1
189 5 64 33 325 31.2 0.583 29 1
191 3 68 15 130 30.9 0.299 34 0
193 1 50 16 375 25.9 0.655 24 0
195 7 70 33 145 25.1 0.163 55 1
196 1 76 36 249 36.5 0.875 29 1
196 8 76 29 280 37.5 0.605 57 1
197 2 70 45 543 30.5 0.158 53 1
197 4 70 39 744 36.7 2.329 31 0
198 0 66 32 274 41.3 0.502 28 1

Statistical Analysis of Data Using SPSS

Statistical Analysis of Data Using SPSS

Introduction and dataset
The aim of this coursework is to investigate and predict the onset of diabetes based on
various diagnostic measurements.

The dataset was originally compiled by researcher at the Johns Hopkins University
School of Medicine, from a larger database owned by the National Institute of Diabetes
and Digestive and Kidney Diseases. All patients were females at least 21 years old of
Pima Indian heritage. Note that Pima Indians have one of the highest rates of diabetes
in the world.

This dataset includes 392 observations, taken at the individual level and available from
diabetes_dataset.xlsx file in Statistical Data Analysis Coursework folder on NOW.
The key indicator of diabetes (response variable), as defined by the World Health
Organization, is a plasma glucose concentration greater than 200 mg/dl two hours
following ingestion of a 75 gm carbohydrate solution (variable Glucose).

Glucose Pregnancies BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
56 2 56 28 45 24.2 0.332 22 0
68 2 62 13 15 20.1 0.257 23 0
68 2 70 32 66 25 0.187 25 0
68 10 106 23 49 35.5 0.285 47 0
71 1 48 18 76 20.4 0.323 22 0
71 1 78 50 45 33.2 0.422 21 0
74 0 52 10 36 27.8 0.269 22 0
74 3 68 28 45 29.7 0.293 23 0
74 8 70 40 49 35.3 0.705 39 0
75 2 64 24 55 29.7 0.37 33 0
77 1 56 30 56 33.3 1.251 24 0
77 5 82 41 42 35.8 0.156 35 0
78 3 50 32 88 31 0.248 26 1
78 0 88 29 40 36.9 0.434 21 0
79 1 80 25 37 25.4 0.583 22 0
79 1 60 42 48 43.5 0.678 23 0
80 1 74 11 60 30 0.527 22 0
80 3 82 31 70 34.2 1.292 27 1
81 1 72 18 40 26.6 0.283 24 0
81 3 86 16 66 27.5 0.306 22 0
81 2 72 15 76 30.1 0.547 25 0
81 1 74 41 57 46.3 1.096 32 0
81 7 78 40 48 46.7 0.261 42 0
82 1 64 13 95 21.2 0.415 23 0
82 2 52 22 115 28.5 1.699 25 0
83 7 78 26 71 29.3 0.767 36 0
83 2 66 23 50 32.2 0.497 22 0
83 3 58 31 18 34.3 0.336 25 0
83 2 65 28 66 36.8 0.629 24 0
84 2 50 23 76 30.4 0.968 21 0
84 3 68 30 106 31.9 0.591 25 0
84 0 64 22 66 35.8 0.545 21 0
84 1 64 23 115 36.9 0.471 28 0
84 0 82 31 125 38.2 0.233 23 0
84 4 90 23 56 39.5 0.159 25 0
85 4 58 22 49 27.8 0.306 28 0
86 5 68 28 71 30.2 0.364 24 0
86 1 66 52 65 41.3 0.917 29 0
87 2 58 16 52 32.7 0.166 25 0
87 1 78 27 32 34.6 0.101 22 0
87 1 60 37 75 37.2 0.509 22 0
87 1 68 34 77 37.6 0.401 24 0
88 5 66 21 23 24.4 0.342 30 0
88 3 58 11 54 24.8 0.267 22 0
88 2 58 26 16 28.4 0.766 22 0
88 2 74 19 53 29 0.229 22 0
88 1 62 24 44 29.9 0.422 23 0
88 1 78 29 76 32 0.365 29 0
88 12 74 40 54 35.3 0.378 48 0
88 1 30 42 99 55 0.496 26 1
89 1 24 19 25 27.8 0.559 21 0
89 1 66 23 94 28.1 0.167 21 0
89 3 74 16 85 30.4 0.551 38 0
89 1 76 34 37 31.2 0.192 23 0
90 2 80 14 55 24.4 0.249 24 0
90 1 62 18 59 25.1 1.268 25 0
90 1 62 12 43 27.2 0.58 24 0
90 4 88 47 54 37.7 0.362 29 0
91 1 54 25 100 25.2 0.234 23 0
91 4 70 32 88 33.1 0.446 22 0
91 0 68 32 210 39.9 0.381 25 0
92 1 62 25 41 19.5 0.482 25 0
92 12 62 7 258 27.6 0.926 44 1
92 6 62 32 126 32 0.085 46 0
93 0 60 25 92 28.7 0.532 22 0
93 6 50 30 64 28.7 0.356 23 0
93 2 64 32 160 38 0.674 23 1
93 0 100 39 72 43.4 1.021 35 0
94 2 68 18 76 26 0.561 21 0
94 2 76 18 66 31.6 0.649 23 0
94 7 64 25 79 33.3 0.738 41 0
94 0 70 27 115 43.5 0.347 21 0
95 1 66 13 38 19.6 0.334 25 0
95 1 60 18 58 23.9 0.26 22 0
95 1 74 21 73 25.9 0.673 36 0
95 2 54 14 88 26.1 0.748 22 0
95 1 82 25 180 35 0.233 43 1
95 0 80 45 92 36.5 0.33 26 0
95 0 85 25 36 37.4 0.247 24 1
95 0 64 39 105 44.6 0.366 22 0
96 4 56 17 49 20.8 0.34 26 0
96 2 68 13 49 21.1 0.647 26 0
96 3 56 34 115 24.7 0.944 39 0
96 1 64 27 87 33.2 0.289 21 0
96 5 74 18 67 33.6 0.997 43 0
97 1 64 19 82 18.2 0.299 21 0
97 1 66 15 140 23.2 0.487 22 0
97 0 64 36 100 36.8 0.6 25 0
97 7 76 32 91 40.9 0.871 32 1
98 0 82 15 84 25.2 0.299 22 0
98 6 58 33 190 34 0.43 43 0
98 2 60 17 120 34.7 0.198 22 0
99 3 80 11 64 19.3 0.284 30 0
99 2 70 16 44 20.4 0.235 27 0
99 3 62 19 74 21.8 0.279 26 0
99 4 76 15 51 23.2 0.223 21 0
99 2 52 15 94 24.6 0.637 21 0
99 3 54 19 86 25.6 0.154 24 0
99 6 60 19 54 26.9 0.497 32 0
99 5 54 28 83 34 0.499 30 0
99 2 60 17 160 36.6 0.453 21 0
99 1 72 30 18 38.6 0.412 21 0
100 1 74 12 46 19.5 0.149 28 0
100 1 66 15 56 23.6 0.666 26 0
100 1 72 12 70 25.3 0.658 28 0
100 12 84 33 105 30 0.488 46 0
100 0 70 26 50 30.8 0.597 21 0
100 3 68 23 81 31.6 0.949 28 0
100 1 66 29 196 32 0.444 42 0
100 2 66 20 90 32.9 0.867 28 1
100 14 78 25 184 36.6 0.412 46 1
100 2 54 28 105 37.8 0.498 24 0
100 2 68 25 71 38.5 0.324 26 0
100 8 74 40 215 39.4 0.661 43 1
100 2 70 52 57 40.5 0.677 25 0
100 0 88 60 110 46.8 0.962 31 0
101 2 58 35 90 21.8 0.155 22 0
101 2 58 17 265 24.2 0.614 23 0
101 1 50 15 36 24.2 0.526 26 0
101 10 76 48 180 32.9 0.171 63 0
102 0 86 17 105 29.3 0.695 27 0
102 3 44 20 94 30.8 0.4 26 0
102 0 78 40 90 34.5 0.238 24 0
102 7 74 40 105 37.2 0.204 45 0
102 0 64 46 78 40.6 0.496 21 0
102 2 86 36 120 45.5 0.127 23 1
103 1 80 11 82 19.4 0.491 22 0
103 4 60 33 192 24 0.966 33 0
103 3 72 30 152 27.6 0.73 27 0
103 6 72 32 190 37.7 0.324 55 0
103 1 30 38 83 43.3 0.183 33 0
104 0 64 23 116 27.8 0.454 23 0
104 6 74 18 156 29.9 0.722 41 1
104 0 64 37 64 33.6 0.51 22 1
105 6 70 32 68 30.8 0.122 37 0
105 2 80 45 191 33.7 0.711 29 1
105 2 58 40 94 34.9 0.225 25 0
105 5 72 29 325 36.9 0.159 28 0
105 0 64 41 142 41.5 0.173 22 0
106 2 56 27 165 29 0.426 22 0
106 2 64 35 119 30.5 1.4 34 0
106 3 54 21 158 30.9 0.292 24 0
106 1 70 28 135 34.2 0.142 22 0
106 0 70 37 148 39.4 0.605 22 0
107 3 62 13 48 22.9 0.678 23 1
107 1 72 30 82 30.8 0.821 24 0
107 2 74 30 100 33.6 0.404 23 0
107 0 62 30 74 36.6 0.757 25 1
108 6 44 20 130 24 0.813 35 0
108 2 62 32 56 25.2 0.128 21 0
108 2 62 10 278 25.3 0.881 22 0
108 2 52 26 63 32.5 0.318 22 0
108 1 60 46 178 35.5 0.415 24 0
108 5 72 43 75 36.1 0.263 33 0
109 1 38 18 120 23.1 0.407 26 0
109 1 56 21 135 25.2 0.833 23 0
109 1 60 8 182 25.4 0.947 21 0
109 8 76 39 114 27.9 0.64 31 1
109 1 58 18 116 28.5 0.219 22 0
109 4 64 44 99 34.8 0.905 26 1
109 5 62 41 129 35.8 0.514 25 1
110 4 76 20 100 28.4 0.118 27 0
110 2 74 29 125 32.4 0.698 27 0
111 1 62 13 182 24 0.138 23 0
111 3 90 12 78 28.4 0.495 29 0
111 3 58 31 44 29.5 0.43 22 0
111 4 72 47 207 37.1 1.39 56 1
112 2 68 22 94 34.1 0.315 26 0
112 9 82 32 175 34.2 0.26 36 1
112 1 72 30 176 34.4 0.528 25 0
112 1 80 45 132 34.8 0.217 24 0
112 2 86 42 160 38.4 0.246 28 0
112 2 78 50 140 39.4 0.175 24 0
113 3 50 10 85 29.5 0.626 25 0
114 7 76 17 110 23.8 0.466 31 0
114 1 66 36 200 38.1 0.289 21 0
114 0 80 34 285 44.2 0.167 27 0
115 1 70 30 96 34.6 0.529 32 1
115 3 66 39 140 38.1 0.15 28 0
116 4 72 12 87 22.1 0.463 37 0
116 3 74 15 105 26.3 0.107 24 0
116 1 78 29 180 36.1 0.496 25 0
117 2 90 19 71 25.2 0.313 21 0
117 0 66 31 188 30.8 0.493 22 0
117 4 64 27 120 33.2 0.23 24 0
117 1 60 23 106 33.8 0.466 27 0
117 1 88 24 145 34.5 0.403 40 1
117 5 86 30 105 39.1 0.251 42 0
117 0 80 31 53 45.2 0.089 24 0
118 1 58 36 94 33.3 0.261 23 0
118 0 84 47 230 45.8 0.551 31 1
119 1 54 13 50 22.3 0.205 24 0
119 6 50 22 176 27.1 1.318 33 1
119 0 64 18 92 34.9 0.725 23 0
119 1 44 47 63 35.5 0.28 25 0
119 1 88 41 170 45.3 0.507 26 0
119 1 86 39 220 45.6 0.808 29 1
120 9 72 22 56 20.8 0.733 48 0
120 0 74 18 63 30.5 0.285 26 0
120 1 80 48 200 38.9 1.162 41 0
120 2 76 37 105 39.7 0.215 29 0
120 11 80 37 150 42.3 0.785 48 1
120 3 70 30 135 42.9 0.452 30 0
121 5 72 23 112 26.2 0.245 30 0
121 0 66 30 165 34.3 0.203 33 1
121 1 78 39 74 39 0.261 28 0
121 2 70 32 95 39.1 0.886 23 0
122 2 60 18 106 29.8 0.717 22 0
122 1 64 32 156 35.1 0.692 30 1
122 2 76 27 200 35.9 0.483 26 0
122 2 52 43 158 36.2 0.816 28 0
122 1 90 51 220 49.7 0.325 31 1
123 4 80 15 176 32 0.443 34 0
123 9 70 44 94 33.1 0.374 40 0
123 6 72 45 230 33.6 0.733 34 0
123 5 74 40 77 34.1 0.269 28 0
123 2 48 32 165 42.1 0.52 26 0
123 3 100 35 240 57.3 0.88 22 0
124 0 56 13 105 21.8 0.452 21 0
124 7 70 33 215 25.5 0.161 37 0
124 8 76 24 600 28.7 0.687 52 1
124 2 68 28 205 32.9 0.875 30 1
124 3 80 33 130 33.2 0.305 26 0
124 9 70 33 402 35.4 0.282 34 0
125 1 70 24 110 24.3 0.221 25 0
125 4 70 18 122 28.9 1.144 45 1
125 6 68 30 120 30 0.464 32 0
125 10 70 26 115 31.1 0.205 41 1
125 1 50 40 167 33.3 0.962 28 1
125 2 60 20 140 33.8 0.088 31 0
126 8 74 38 75 25.9 0.162 39 0
126 0 86 27 120 27.4 0.515 21 0
126 1 56 29 152 28.7 0.801 21 0
126 5 78 27 22 29.6 0.439 40 0
126 0 84 29 215 30.7 0.52 24 0
126 8 88 36 108 38.5 0.349 49 0
126 3 88 41 235 39.3 0.704 27 0
127 2 58 24 275 27.7 1.6 25 0
127 2 46 21 335 34.4 0.176 22 0
127 4 88 11 155 34.5 0.598 28 0
127 0 80 37 210 36.3 0.804 23 0
128 1 82 17 183 27.5 0.115 22 0
128 0 68 19 180 30.5 1.391 25 1
128 1 98 41 58 32 1.321 33 1
128 3 72 25 190 32.4 0.549 27 1
128 1 88 39 110 36.5 1.057 37 1
128 1 48 45 194 40.5 0.613 24 1
128 2 78 37 182 43.3 1.224 31 1
129 6 90 7 326 19.6 0.582 60 0
129 3 64 29 115 26.4 0.219 28 1
129 4 60 12 231 27.5 0.527 31 0
129 2 74 26 205 33.2 0.591 25 0
129 4 86 20 270 35.1 0.231 23 0
129 10 76 28 122 35.9 0.28 39 0
129 3 92 49 155 36.4 0.968 32 1
129 7 68 49 125 38.5 0.439 43 1
129 0 110 46 130 67.1 0.319 26 1
130 1 70 13 105 25.9 0.472 22 0
130 3 78 23 79 28.4 0.323 34 1
130 1 60 23 170 28.6 0.692 21 0
131 1 64 14 415 23.7 0.389 21 0
131 4 68 21 166 33.1 0.16 28 0
133 7 88 15 155 32.4 0.262 37 0
133 1 102 28 140 32.8 0.234 45 1
134 9 74 33 60 25.9 0.46 81 0
134 0 58 20 291 26.4 0.352 21 0
134 6 70 23 130 35.4 0.542 29 1
134 6 80 37 370 46.2 0.238 46 1
135 0 94 46 145 40.6 0.284 26 0
135 0 68 42 250 42.3 0.365 24 1
136 7 74 26 135 26 0.647 51 0
136 11 84 35 130 28.3 0.26 42 1
136 5 84 41 88 35 0.286 35 1
136 15 70 32 110 37.1 0.153 43 1
136 1 74 50 204 37.4 0.399 24 0
137 0 68 14 148 24.8 0.143 21 0
137 0 40 35 168 43.1 2.288 33 1
138 0 60 35 167 34.6 0.534 21 1
138 11 74 26 144 36.1 0.557 50 1
139 0 62 17 210 22.1 0.207 21 0
139 5 64 35 140 28.6 0.411 26 0
139 1 46 19 83 28.7 0.654 22 0
139 5 80 35 160 31.6 0.361 25 1
139 1 62 41 480 40.7 0.536 21 0
140 1 74 26 180 24.1 0.828 23 0
140 12 82 43 325 39.2 0.528 58 1
140 0 65 26 130 42.6 0.431 24 1
141 2 58 34 128 25.4 0.699 24 0
142 2 82 18 64 24.7 0.761 21 0
142 7 60 33 190 28.8 0.687 61 0
142 7 90 24 480 30.4 0.128 43 1
143 1 74 22 61 26.2 0.256 21 0
143 1 86 30 330 30.1 0.892 23 0
143 11 94 33 146 36.6 0.254 51 1
143 1 84 23 310 42.4 1.076 22 0
144 4 58 28 140 29.5 0.287 37 0
144 2 58 33 135 31.6 0.422 25 1
144 5 82 26 285 32 0.452 58 1
144 6 72 27 228 33.9 0.255 40 0
144 1 82 46 180 46.1 0.335 46 1
145 13 82 19 110 22.2 0.245 57 0
145 9 88 34 165 30.3 0.771 53 1
145 9 80 46 130 37.9 0.637 40 1
146 2 70 38 360 28 0.337 29 1
146 4 85 27 100 28.9 0.189 27 0
146 2 76 35 194 38.2 0.329 29 0
147 4 74 25 293 34.9 0.385 30 0
148 4 60 27 318 30.9 0.15 29 1
148 10 84 48 237 37.6 1.001 51 1
149 1 68 29 127 29.3 0.349 42 1
150 7 66 42 342 34.7 0.718 42 0
150 7 78 29 126 35.2 0.692 54 1
151 6 62 31 120 35.5 0.692 28 0
151 12 70 40 271 41.8 0.742 38 1
151 8 78 32 210 42.9 0.516 36 1
152 13 90 33 29 26.8 0.731 43 1
152 9 78 34 171 34.2 0.893 33 1
152 0 82 39 272 41.5 0.27 27 0
153 1 82 42 485 40.6 0.687 23 0
153 13 88 37 140 40.6 1.174 39 0
154 6 74 32 193 29.3 0.839 39 0
154 9 78 30 100 30.9 0.164 45 0
154 4 72 29 126 31.3 0.338 37 0
154 4 62 31 284 32.8 0.237 23 0
154 6 78 41 140 46.1 0.571 27 0
155 2 74 17 96 26.6 0.433 27 1
155 11 76 28 150 33.3 1.353 51 1
155 8 62 26 495 34 0.543 46 1
155 2 52 27 540 38.7 0.24 25 1
155 5 84 44 545 38.7 0.619 34 0
156 9 86 28 155 34.3 1.189 42 1
157 1 72 21 168 25.6 0.123 24 0
157 2 74 35 440 39.4 0.134 30 0
158 3 64 13 387 31.2 0.295 24 0
158 3 76 36 245 31.6 0.851 28 1
158 3 70 30 328 35.5 0.344 35 1
158 5 84 41 210 39.4 0.395 29 1
160 7 54 32 175 30.5 0.588 39 1
161 10 68 23 132 25.5 0.326 47 1
162 0 76 56 100 53.2 0.759 25 1
163 3 70 18 105 31.6 0.268 28 1
163 17 72 41 114 40.9 0.817 47 1
164 1 82 43 67 32.8 0.341 50 0
165 6 68 26 168 33.6 0.631 49 0
165 0 76 43 255 47.9 0.259 26 0
165 0 90 33 680 52.3 0.427 23 0
166 5 72 19 175 25.8 0.587 51 1
167 1 74 17 144 23.4 0.447 33 1
167 8 106 46 231 37.6 0.165 43 1
168 7 88 42 321 38.2 0.787 40 1
169 3 74 19 125 29.9 0.268 31 1
170 3 64 37 225 34.5 0.356 30 1
171 3 72 33 135 33.3 0.199 24 1
171 9 110 24 240 45.4 0.721 54 1
172 1 68 49 579 42.4 0.702 28 1
173 4 70 14 168 29.7 0.361 33 1
173 3 78 39 185 33.8 0.97 31 1
173 3 84 33 474 35.7 0.258 22 1
173 3 82 48 465 38.4 2.137 25 1
173 0 78 32 265 46.5 1.159 58 0
174 3 58 22 194 32.9 0.593 36 1
174 2 88 37 120 44.5 0.646 24 1
176 3 86 27 156 33.3 1.154 52 1
176 8 90 34 300 33.7 0.467 58 1
177 0 60 29 478 34.6 1.072 21 1
179 8 72 42 130 32.7 0.719 36 1
179 0 50 36 159 37.8 0.455 22 1
180 3 64 25 70 34 0.271 26 0
180 0 90 26 90 36.5 0.314 35 1
180 0 78 63 14 59.4 2.42 25 1
181 8 68 36 495 30.1 0.615 60 1
181 1 64 30 180 34.1 0.328 38 1
181 7 84 21 192 35.9 0.586 51 1
181 1 78 42 293 40 1.258 22 1
181 0 88 44 510 43.3 0.222 26 1
184 4 78 39 277 37 0.264 31 1
186 8 90 35 225 34.5 0.423 37 1
187 7 50 33 392 33.9 0.826 34 1
187 3 70 22 200 36.4 0.408 36 1
187 7 68 39 304 37.7 0.254 41 1
187 5 76 27 207 43.6 1.034 53 1
188 0 82 14 185 32 0.682 22 1
189 1 60 23 846 30.1 0.398 59 1
189 5 64 33 325 31.2 0.583 29 1
191 3 68 15 130 30.9 0.299 34 0
193 1 50 16 375 25.9 0.655 24 0
195 7 70 33 145 25.1 0.163 55 1
196 1 76 36 249 36.5 0.875 29 1
196 8 76 29 280 37.5 0.605 57 1
197 2 70 45 543 30.5 0.158 53 1
197 4 70 39 744 36.7 2.329 31 0
198 0 66 32 274 41.3 0.502 28 1

From the tabulated figures:

(1) Generate two random numbers between 2 and 7 and provide SPSS output.
(1 mark)

(2) Using SPSS, erase columns corresponding to your generated numbers (e.g. if
one of the generated numbers is 5 then erase column C5, etc). Describe how you did
this and provide the sequence of actions (e.g. Calc->Descriptive Stats->….)
(2 mark)

(3) Using SPSS select a random sample of 300 observations (n = 300) from your
dataset. Provide the sequence of actions of how you did this.
(1 mark)
Your unique dataset will now consist of 300 rows and seven columns including
Glucose, Age and Outcome.
Investigating your unique dataset

(4) For your unique dataset summarise information about your observations and present
graphically the frequency distributions for all variables that are left in your unique
dataset including Glucose but excluding Outcome variables. Comment on unusual
observations and make your own decision, how to deal with them.
(6 marks)

(5) Using SPSS, define a new variable, Age_Group, by combining observations
for participants younger than 30 into group 1 and all others (of age 30 and older) into
group 2. Provide either a description or a screen shot of how you did this.
(3 marks)

(6) Investigate whether there is a significant difference in mean/median Glucose
concentration between age groups. Formulate the null and alternative hypotheses;
choose, justify and perform an appropriate statistical test using SPSS; provide all
SPSS outputs; write your conclusions.
(10 marks)

(7) Show whether the proportion of participants with Glucose concentration greater
than 100 mg/dl is different between age groups that you defined previously. Formulate
the null and alternative hypotheses; choose, justify and perform an appropriate
statistical test using SPSS; provide all SPSS outputs; write your conclusions.
(10 marks)

(8) Using SPSS, produce a table of correlation coefficients. Justify the choice of
correlation coefficient, investigate the resulting table and comment on most interesting
relationships between chosen variables. Do not use Glucose and Outcome variables in
this analysis.
(4 marks)

(9) Using simple linear regression, model Glucose concentration by one of the
variables of your choice that are available in your unique dataset. Comment on
significance of intercept and slope.
(4 marks)

(10) Fit a multiple regression model with Glucose being a response variable and other
five variables excluding Outcome as predictors. Treat variable Pregnancies as an
interval scale data. Identify insignificant predictors in the model and explain why they
are insignificant.
(4 marks)

(11) Cluster your 300 observation into 10 groups using one of the linkage method and
similarity measure from the corresponding drop-down menus. Give a brief (half a page)
description of the linkage method and similarity measure chosen. Show a dendrogram
with cases labelled by Outcome. Comment on the results obtained. Provide all
SPSS outputs.
(6 marks)

(12) It is known that the incidence of diabetes in the UK is 0.6. In a small northern
village of 100 people isolated from the mainland for six months per year the pharmacy
wants to know how many insulin shots to order. We want to know what is the
probability that between A and B people will develop the disease during this period. To
perform analysis, generate two random numbers between 0 and 100 using SPSS
and paste the outputs into your report. Denote by A the smallest number and by B the
largest number out of these two generated numbers. Calculate the probability that
between A and B people develop the disease and how many shots should be ordered.
(9 marks)

Categorical (Nominal) Dependent Variables – Logit (Logistic Regression)

  • Here is an introductory/survey video of Logit Analysis, which allows us to analyze nominal dependent variables. Regression only allow us to work with continuous variables.

Video: Introduction to Logit Analysis:
https://youtu.be/ANi_PpkTSJA

Note: This Extra Credit Assignment is a bit tougher than the other ones, so it is worth a bonus of up to 10% of the final grade  if you get everything right. The other assignments are worth 7% each.

  • Afterwatching the video, try this extra credit assignment:
    Prompt:
    Answer Part 1, Part 2, and Part 3. Given the following coefficients from a logit analysis, and the sample data values given for two respondents, calculate the probability of a person liking  a dark-colored imported car over a light-colored imported car. Your answers are probabilities. Show your work. Use Word or PDF format for submission to Turnitin.com (link below). You may need to hand-write the formula and show your work on paper, then photograph or scan it into a file. That’s OK, but typing it into Word is preferred, if you can figure it out.

The Dependent Variable (DV) is “Prefers Dark colored imported car.” This measure is labeled”PrefDark” in the data
= 0 if preference is for a light colored car,
= 1 if preference is for a dark-colored car.

Here are the Independent Variables  (IVs):
Age in years (no intervals – labeled “Age” in the data)

Gender (measure is labeled “Gender” in the data)
= 0 if male,
= 1 if female.

Education level (measure is labeled EducLevel in the data)
= 0 if completed high school only
= 1 if completed Associate’s degree (Community College)
= 2 if completed Undergraduate degree (BA or BS)
= 3 if completed a Graduate degree

Income per year (in Euros, measure is labeled Income))

Consider, also, these coefficients for each measure (data point), calculated by running a Logit analysis on the data sample for the DV, PrefDark:

Coefficients and Constant
Age             0.101
Gender        0.34
EducLevel  –5.1
Income        0.000142
Constant      3.22

Assume all coefficients and the constant are statistically significant (you can’t ignore them).

Part 1 (4 points):
Now consider this person, Respondent 1:
Age = 24
Gender = 1 (female)
EducLevel = 2 (Undergraduate degree)
Income/year =  Euros 38000
What is the probability this person prefers a dark-colored imported car?

Part 2 (4 points):
Additionally, consider this other person, Respondent 2:
54 year old male, with a graduate degree, earning Euros 58000 per year.
What is the probability this person prefers a dark-colored imported car?

Hint: Use the formula given in the video for calculating P(Yi=yi).

Show your work, please.

Part 3 (2 points)
Which Respondent has a higher probability of preferring a dark-colored car?
This is quite straightforward if you have Parts 1 and 2 correct.

 

Regression Modeling

Assignment Content

  1. Purpose 
    This assignment provides an opportunity to develop, evaluate, and apply bivariate and multivariate linear regression models.

    Resources: Microsoft Excel®, DAT565_v3_Wk5_Data_File

    Instructions:
    The Excel file for this assignment contains a database with information about the tax assessment value assigned to medical office buildings in a city. The following is a list of the variables in the database:

    • FloorArea: square feet of floor space
    • Offices: number of offices in the building
    • Entrances: number of customer entrances
    • Age: age of the building (years)
    • AssessedValue: tax assessment value (thousands of dollars)
    • Use the data to construct a model that predicts the tax assessment value assigned to medical office buildings with specific characteristics.
    • Construct a scatter plot in Excel with FloorArea as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables?
    • Use Excel’s Analysis ToolPak to conduct a regression analysis of FloorArea and AssessmentValue. Is FloorArea a significant predictor of AssessmentValue?
    • Construct a scatter plot in Excel with Age as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables?
    • Use Excel’s Analysis ToolPak to conduct a regression analysis of Age and Assessment Value. Is Age a significant predictor of AssessmentValue?
    • Construct a multiple regression model.
    • Use Excel’s Analysis ToolPak to conduct a regression analysis with AssessmentValue as the dependent variable and FloorAreaOfficesEntrances, and Age as independent variables. What is the overall fit r^2? What is the adjusted r^2?
    • Which predictors are considered significant if we work with α=0.05? Which predictors can be eliminated?
    • What is the final model if we only use FloorArea and Offices as predictors?
    • Suppose our final model is:
    • AssessedValue = 115.9 + 0.26 x FloorArea + 78.34 x Offices
    • What wouldbe the assessed value of a medical office building with a floor area of 3500 sq. ft., 2 offices, that was built 15 years ago? Is this assessed value consistent with what appears in the database?
    • Submit your assignment.

      Resources

    • Center for Writing Excellence
    • Reference and Citation Generator
    • Grammar and Writing Guides