Demographic Analysis using Dataframes:

Assume that You are employed as a Data Scientist by the World Bank and you are working on a project to analyze the World’s demographic trends.

You are required to produce a scatterplot illustrating Birth Rate and Internet Usage statistics by Country.

The scatterplot needs to also be categorized by Countries’ Income Group

You have received an urgent update from your manager that You are required to produce a second scatterplot also illustrating Birth Rate and Internet Usage statistics by Country.

However, this time the scatterplot needs to be categorized by Countries’ Regions.

The demographic data is provided to you in csv format

Please find the below Link for excel file(P2-Demographic-Data), download it in desktop to analyze the data.

https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/P2-Demographic-Data.csv

And the Three vectors required to perform demographic analysis are provided by the company in Text file format (Vectors_data) attached below.

Copy the vectors from text file and paste in R script to work on this scenario

The three vectors used in our use case are:

  1. Countries data
  2. Country Codes
  3. Regions

Step1 :  we are asked  to produce a scatterplot illustrating Birth Rate and Internet Usage statistics by Country.

The scatterplot needs to also be categorized by Countries’ Income Group

First will import our excel data into R studio

Step2: The function we are using is read.csv ()

Here we can see the data is loaded in Right side Environment Pane, all the five columns data is imported from excel.

Step3: Now Let’s work on few Functions:

Head: Returns the first or last parts of a vector, matrix, table, data frame or function. Since head () and tail () are generic functions, they may also have been extended to other classes.

Will quickly see how this function works:

Here we can see this function prints first 6 records.

Now if we want to fetch the Albania Country Birth rate, which is our fourth record.

Syntax will be written like this:

Run the line10, the output is shown in console pane, it is 12.877

The same Birth rate for Albania Country is printed for the below syntax as well:

Use of $(dollar) Symbol:

The $ operator can be used to select a variable/column, to assign new values to a variable/column, or to add a new variable/column in an R object.

It fetches the entire Column Data.

Now let us see how this dollar operator works.

Here all the column data of internet users are printed.

Use of str Function ():

This Function Compactly Displays the Structure of an Arbitrary Object

This Compactly displays the internal structure of an object, in simple terms, it defines the summary of the object

Example:

Here this function is printing the summary of the object

Looking at the summary data, as we are not that much clear about the level of the data shown here.

To see the column Level of data we use Function called levels ()

Use of levels () Function:

Levels provides access to the levels attribute of a variable. The first form returns the value of the levels of its argument and second sets the attribute.

We Will work on more functions in data frames and will solve this use case in the next post…. Keep Coding…. 😊

5 1 vote
Article Rating
2 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments