Introduction & Descriptive Statistics Essay

TOPIC 1 INTRODUCTION & DESCRIPTIVE STATISTICS BASIC CONCEPTS Situation: A journalist is preparing a program segment on what appears to be the relatively disadvantaged financial position of women and the incidence of female poverty in Australia. Several questions may arise, for example: • What is the pattern of female incomes? • How severe is the problem of female poverty and what proportion fall below the ‘poverty line’? • Has their general level of income improved over the last ten years? • Are single working mothers especially disadvantaged? • Working mothers often need to put their children into day-care.

What should the capacity of the local centre be? • How does their income compare with their male counterparts? Has the gap become any smaller over the last ten years? • Do women have less leisure time than their married partners? • Has the occupational pattern of women changed since the previous generation? • Is there any connection between the occupation of a working mother and the leisure activities of her eldest daughter? • Are their incomes related to age, ethnic origin, education or other factors? Answering these questions would require almost the full range of techniques covered in this course.

We will write a custom essay sample on
Introduction & Descriptive Statistics Essay
or any similar topic only for you
Order now

One of the most important initial steps in this investigation process is to develop a realistic picture of how the incomes of adult females and other variables of interest in the study vary. All investigation and research is about Variables. In this instance, The Variables of interest in the study include E. g. Population: The whole or entire collection of cases that are of interest in an investigation. Eg. Sample: A part of that population which is small enough to be economical and large enough to give us a reasonably accurate picture of the whole. Eg.

Data: The measured values of the variable(s) of interest for every member of the sample. Eg. Statistic: A summary measure of the variable of interest in the sample. Eg. Parameter: A summary measure of the variable of interest in the population Eg. Government social security policy should be based on the proportion of all women who live below the poverty line. However, this proportion is a parameter, which can be guessed, theorised, assumed, believed or estimated, but almost never known for certain. We must use sample statistics to assist us in learning about them. Statistical Inference In general, we never see the WHOLE (population) but must make our decision based on information gathered from the PART (sample). Whenever we draw conclusions about the whole population based on sample information, we are practising STATISTICAL INFERENCE. Eg. • Inferences and conclusions can always be wrong. There can never be complete certainty. Later we will apply the concepts of Confidence and Significance to statistical inference. Crudely speaking, • Confidence level is concerned with our chances of being right. • Significance level is concerned with our chances of being wrong.

TYPE OF VARIABLES & LEVELS OF MEASUREMENT Nominal = Categorical = Qualitative Eg. • The categories may be recorded in number form (eg 1,2,3,4) but the numbers have no numerical meaning and generally cannot be used in calculation. Ordinal = Ranked Eg. • Order is meaningful and numbers assigned have some numerical meaning. Interval = Metric = Quantitative ‘The number of people in a room’ is a Discrete interval variable because only whole numbers are possible. Height, Weight, Distance, Money, Time, Temperature, Longitude are Continuous interval variables because any fractional numbers are possible.

With Interval variables • Order and difference are meaningful. • If we can ask “How Much”, “How Often”, or “How many”, it is always Interval • Any variable that has only two values can be regarded as both Nominal and Interval Eg. The level of measurement is important in determining the techniques we use to 1) Describe the behaviour of a variable, 2) Draw conclusions from observed information, and 3) Investigate relationships between variables DESCRIBING THE BEHAVIOUR OF A VARIABLE A Frequency Distribution displays the behaviour of a variable.

It is the ‘popularity pattern’ of its observed values. We are interested in describing this overall behaviour. Some features of particular interest are • Dominant values that tend to occur more often than the others These are called Modes. There can be one or more modes. Eg. • Measures of Location: Single values that can be used to represent the whole group of cases The Median is the value that has half the cases below it. Eg. The Mean is the simple average of all the values. Eg. • The Amount of Variation The Observed Variety of Values Eg.

The Range is the difference between the highest and lowest value Eg. The Standard Deviation is the square root of the average squared difference between the mean and each of the cases Eg. The Variance is the average squared difference between the mean and each of the cases Eg. • The Type of Variation Skewness is a tendencies towards the higher or the lower values. It is positive when the pattern leans to the left and negative when it leans to the right. There is no skewness if the pattern is symmetrical. Eg. Kurtosis is a tendencies towards or away from the dominant or representative values.

It is mesokurtic when the cases tend to be drawn strongly inward towards the centre or some dominant value. It is more platykurtic when cases tend to be pushed outward. Eg. TABLES & GRAPHICAL DATA REPRESENTATION Nominal Data Example: The marital status of 15 people is as follows: M D S M W S M M D S M D W M S M= MarriedS = SingleD=Divorced W = Widowed Frequency Table |Status |Frequency |Relative Frequency| |S |4 |. 27 | |M |6 |. 40 | |D |3 |. 0 | |W |2 |. 13 | Bar Chart [pic] Pie Chart [pic] The only meaningful measure of spread or variation is the observed variety of values: Single, Married, Divorced, Widowed Ranked Data Ranked data can also be represented using the same methods as for nominal data, but the order is important. Also its ‘centre’ can often be meaningfully represented by the Median and the variation can sometimes be measured by the Range or the Inter Quartile Range. Interval Data Example: The heights of 16 people (to nearest cm. ) 147. 56 156 157 162 167 169 172 172 173 175 176 180 181 183 192 Stem and Leaf Display StemLeaf 147 15667 16279 1722356 18013 192 Frequency Table |Group |Frequency |Relative |Cumulative Rel. Freq. | | | |Frequency | | |145. 5-155. 5 |1 |. 06 |. 06 | |155. 5-165. 5 |4 |. 25 |. 31 | |165. 5-175. 5 |6 |. 38 |. 9 | |175. 5-185. 5 |4 |. 25 |. 94 | |185. 5-195. 5 |1 |. 06 |1. 00 | | |16 |1. 00 | | Histogram [pic] Ogive (Cumulative Frequency Polygon) [pic] Lower Quartile (LQ) = 162. [25% or one quarter of the sample has values below 162] Median = 170 [50% or half of the sample has values below 170] Upper Quartile (UQ) = 178 [75% or three quarters of the sample are below 178]

Inter-Quartile Range = UQ-LQ = 178-162 = 16 Percentiles We can calculate the % of cases between any two values eg 168 and 184 184 is 93rd percentile. 168 is 48th percentile 93%-48% = 45% of cases. Measures of Location • Mode: Most dominant/popular/frequently occurring measurement or category May be also be used for Nominal and Ordinal data • Median: Middle measure when the list is in ascending order. May also be used for Ordinal data • Mean: Simple average. Example: For the following data: 4,6,10,9,10,3 The mode is 10 The median is 7. 5 [middle value after ordering] The mean is pic][pic]= (4+6+10+9+10+3) / 6 = 42 / 6 = 7 Variability Sample Standard Deviation [pic] Sample Variance Variance = S2 Example: Calculate the standard deviation, variance and range for the values 4,6,10,9,10,3 Best manual method is to set up a table X [pic] [pic] 4 -3 9 6 -1 1 n = 6 10 3 9 9 2 4 10 3 9 3 -4 16 ?=42 ? = 48 [pic]=7S = v(48 / 5) = 3. 1 Variance S2 = (3. 1)2 = 9. 6 Range 10 – 3 = 7 ———————– =1. 0

×

Hi there, would you like to get such a paper? How about receiving a customized one? Check it out