Question 1

Question 2

Question 3

Q3a.
Q3b.
Q3c.

Question 4

The following is the Python libraries we need for this homework.

import pandas as pd
import numpy as np

Question 1

The following is the list, a:

a = [0.1, 1.2, 2.3, 3.4, 4.5]

Q1a

Write a Python code that uses the list, a, to create the following Numpy Array:

array([0.1, 1.2, 2.3, 3.4, 4.5])

Answer

arr_a = np.array(a)

Q1b

Write a Python code that uses the list, a, to create the following pandas Series:

########################
# Index      0

# 0         0.1
# 1         1.2
# 2         2.3
# 3         3.4
# 4         4.5

# dtype: float64    
########################

Answer

s = pd.Series(a)
s

Q1c

Write a Python code that uses the list, a, to create the following pandas Series:

########################
# Index      0

# a         0.1
# b         1.2
# c         2.3
# d         3.4
# e         4.5

# dtype: float64    
########################

Answer

s = pd.Series(a, 
              index = ['a','b','c','d','e'])
s

Q1d

Write a Python code that uses (1) the Series created in Q1c and (2) Boolean indexing to get the following Series:

########################
# Index      0

# c         2.3
# d         3.4
# e         4.5

# dtype: float64    
########################

Answer

s[ s > 2 ]

Question 2

The next line creates a list of tuples that are percentiles and Household Incomes at the specified percentiles

hh_income = [ (10, 14629), (20, 25600), (30, 37002),
              (40, 50000), (50, 63179), (60, 79542),
              (70, 100162), (80, 130000), (90, 184292) ]

Q2a

Write a Python code that uses the list, hh_income, to assign the object, hh_income_array, to the following Numpy array:

############################

# array([[    10,  14629],
#        [    20,  25600],
#        [    30,  37002],
#        [    40,  50000],
#        [    50,  63179],
#        [    60,  79542],
#        [    70, 100162],
#        [    80, 130000],
#        [    90, 184292]])

############################

Answer

hh_income_array = np.array(hh_income)

Q2b

Write a Python code that uses the print() function to report the dimensions of the ndarray and the number of elements in hh_income_array as follows:

Dimensions of the NumPy array, hh_income_array, is: (9, 2)

Number of elements in the NumPy array, hh_income_array, is: 18

Answer

print("Dimensions of the NumPy array, hh_income_array, is: ", hh_income_array.shape)
print("Number of elements in the NumPy array, hh_income_array, is: ", hh_income_array.size)

Question 3

The following is the NumPy array, c:

c = np.array([ [1.0, 2], [3, 4] ])

Q3a.

Write a Python code that uses the NumPy array, c, to create the following DataFrame:

############################

# index     0    1
# 0         1.0  2.0
# 1         3.0  4.0

############################

Answer

df = pd.DataFrame(c)
df

Q3b.

Write a Python code that uses the NumPy array, c, to create the following DataFrame:

############################

# index     dogs    cats
# 0         1.0     2.0
# 1         3.0     4.0

############################

Answer

df = pd.DataFrame(c, columns=['dogs','cats'])
df

Q3c.

Write a Python code that uses the NumPy array, c, to create the following DataFrame:

############################

# index             dogs    cats
# byeong-hak        1.0     2.0
# your_first_name   3.0     4.0

############################

Answer

df = pd.DataFrame(c, 
                  columns=['dogs','cats'],
                  index = ['byeong-hak', 'your_first_name'])
df

Question 4

Download the file, US_state_GDP.zip, from the Files section in our Canvas. Extract the zip file, US_state_GDP.zip, to use the CSV file, US_state_GDP.csv.

Assign path_csv to the string of the absolute pathname of the file, US_state_GDP.csv.

####################################################################################################################################
# For example

# path_csv = '/Users/byeong-hakchoe/Google Drive/suny-geneseo/teaching-materials/lecture-data/US_state_GDP.csv'
# path_csv = 'C:/byeong-hakchoe/Google Drive/suny-geneseo/teaching-materials/lecture-data/US_state_GDP.csv'

####################################################################################################################################

Q4a

Read the data file, US_state_GDP.csv, as the object name, state_gdp, using (1) path_csv and (2) pd.read_csv() function.

Answer

# This is an example of the absolute path of the CSV file
path_csv = '/Users/byeong-hakchoe/Google Drive/suny-geneseo/teaching-materials/lecture-data/US_state_GDP.csv'
state_gdp = pd.read_csv(path_csv)

Q4b

Write a Python code that uses the DataFrame, state_gdp, to create the DataFrame, whose first five rows are as follows:

############################################

# index    state_code           state

# 0          AK                Alaska
# 1          AL               Alabama
# 2          AR              Arkansas
# 3          AZ               Arizona
# 4          CA            California

############################################

Answer

state_gdp[ [ 'state_code', 'state' ] ]

Q4c

Write a Python code that uses (1) the DataFrame, state_gdp, and (2) state_gdp.columns to create the DataFrame, whose first five rows are as follows:

############################################

#                    state  gdp_2009

# 0                 Alaska     44215
# 1                Alabama    149843
# 2               Arkansas     89776
# 3                Arizona    221405
# 4             California   1667152

############################################

Answer

cols = state_gdp.columns
state_gdp[ cols[1:3] ]

Q4d

Write a Python code to get the first three rows of the DataFrame, state_gdp:

Answer

state_gdp[ 1:3 ]

Q4e

Write a Python code to get all the rows of the DataFrame, state_gdp, for which the value of gdp_growth_2010 is less than 0

Answer

state_long_recession = state_gdp['gdp_growth_2010'] < 0
state_gdp[ state_long_recession ]

Q4f

Write a Python code that uses state_gdp.loc[] to get the following DataFrame:

############################################

#       state  gdp_growth_2010

# 0    Alaska             -1.7
# 3   Arizona             -0.2
# 33   Nevada             -0.4
# 50  Wyoming             -1.3

############################################

Answer

state_gdp.loc[ state_long_recession,['state', 'gdp_growth_2010'] ]

Q4g

Write a Python code that uses state_gdp.iloc[] to get the following DataFrame:

############################################

#    state_code     state

# 10         GA   Georgia
# 11         HI    Hawaii
# 12         IA      Iowa
# 13         ID     Idaho
# 14         IL  Illinois

############################################

Answer

state_gdp.iloc[ 10:15, :2 ]

DANL 100: Programming for Data Analytics

DANL 100 - Homework Assignment 4 - Example Answers

Byeong-Hak Choe

2023-02-14

Question 1

Q1a

Q1b

Q1c

Q1d

Question 2

Q2a

Q2b

Question 3

Q3a.

Q3b.

Q3c.

Question 4

Q4a

Q4b

Q4c

Q4d

Q4e

Q4f

Q4g