Office Hours
Google Data Analytics Certificate
Course programs for Google Data Analytics Certificate are hosted on Coursera.
Google Data Analytics Certificate
Google Data Analytics Certificate program uses R.
Why Python, R, and SQL?
Stack Overflow Trends
pandas
pandas
pandas
is a Python library including the following features:
pandas
provides high-performance data structures and data analysis tools.
import pandas as pd
pandas
pd.DataFrame
DataFrame
is the primary structure of pandas.
DataFrame
represents a table of data with an ordered collection of columns.
Each column can have a different data type.
DataFrame
can be thought of as a dictionary of Series
sharing the same index.
pandas
Create DataFrame
pd.DataFrame()
creates a DataFrame
which is a two-dimensional tabular-like structure with labeled axis (rows and columns).data = {"state": ["Ohio", "Ohio", "Ohio", "Nevada", "Nevada", "Nevada"], "year": [2000, 2001, 2002, 2001, 2002, 2003], "population": [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}frame = pd.DataFrame(data)
In this example the construction of the DataFrame
is done by passing a dictionary of equal-length lists.
It is also possible to pass a dictionary of NumPy arrays.
NaN
:frame2 = pd.DataFrame(data, columns=["state", "year","population", "income"])frame2
Series
.frame2 = pd.DataFrame(data, columns=["year", "state", "population"])frame2
We can pass the following types of objects to pd.DataFrame()
:
2D NumPy arrays
Dict of lists, tuples, dicts, arrays, or Series
List of lists, tuples, dicts, or Series
Another DataFrame
pandas
Indexing DataFrame
DataFrame
as follows:frame2["change"] = [1.2, -3.2, 0.4, -0.12, 2.4, 0.3]frame2["change"]
DataFrame
, a Series
is returned,frame2.change
, is also possible.Series
has the same index as the initial DataFrame
.frame2[ ["state", "population"] ]
index.name
and columns.name
respectively:frame2.index.name = "number:"frame2.columns.name = "variable:"frame2
DataFrames
, there is no default name for the index or the columns.DataFrame.reindex()
creates new DataFrame
with data conformed to a new index, while the initial DataFrame
will not be changed:frame3 = frame.reindex([0, 2, 3, 4])frame3
data = {"company": ["Daimler", "E.ON", "Siemens", "BASF", "BMW"],"price": [69.2, 8.11, 110.92, 87.28, 87.81],"volume": [4456290, 3667975, 3669487, 1778058, 1824582]}companies = pd.DataFrame(data)companiescompanies[2:]
Index values that are not already present will be filled with NaN
by
default.
The pd.isna()
and pd.notna()
functions detect missing data:
companies3 = companies.reindex(index = [0, 2, 3, 4, 5], columns=["company", "price", "market cap"])companies3pd.isna(companies3)pd.notna(companies3)
drop
with a sequence of labels will drop values from the row labels (axis 0):obj = pd.Series(np.arange(5.), index = ["a", "b", "c", "d", "e"])objnew_obj = obj.drop("c")new_objobj.drop(["d", "c"])
pandas
Dropping columns
DataFrame
, index values can be deleted from either axis. To illustrate this, we first create an example DataFrame
:data = pd.DataFrame(np.arange(16).reshape((4, 4)), index = ["Ohio", "Colorado", "Utah", "New York"], columns=["one", "two", "three", "four"])datadata.drop(index = ["Colorado", "Ohio"])
columns
keyword:data.drop(columns=["two"])
axis=1
or axis="columns"
:data.drop("two", axis=1)data.drop(["two", "four"], axis="columns")
del DataFrame[column]
deletes column from DataFrame
.del data["two"]data
pandas
Indexing, selecting and filtering
np.array
.data = {"company": ["Daimler", "E.ON", "Siemens", "BASF", "BMW"],"price": [69.2, 8.11, 110.92, 87.28, 87.81],"volume": [4456290, 3667975, 3669487, 1778058, 1824582]}companies = pd.DataFrame(data)companiescompanies[2:]
companies2 = pd.DataFrame(data, index = ["a", "b", "c", "d", "e"])companies2companies2["b":"d"]
DataFrame.loc()
selects a subset of rows and columns from a DataFrame using axis labels.
DataFrame.iloc()
selects a subset of rows and columns from a
DataFrame using integers.
companies2.loc[ "c", ["company", "price"] ]companies2.iloc[ 2, [0, 1] ]companies2.loc[ ["c", "d", "e"], ["volume", "price", "company"] ]companies2.iloc[ 2:, : :-1 ]
df[val]
selects single column or set of columns;df.loc[val]
selects single row or set of rows;df.loc[:, val]
selects single column or set of columns;df.loc[val1, val2]
selects row and column by label;df.iloc[where]
selects row or set of rows by integer position;df.iloc[:, where]
selects column or set of columns by integer position;df.iloc[w1, w2]
Select row and column by integer position.pandas
Operations between DataFrame
s and Series
series
is generated from the first row of the DataFrame
:companies3 = companies[["price", "volume"]]companies3.index = ["Daimler", "E.ON", "Siemens", "BASF", "BMW"]series = companies3.iloc[2]companies3series
DataFrames
and Series
match the index of the Series
on the DataFrame
's columns:companies3 + series
DataFrame.add()
does addition along a column matching the DataFrame
's row index (axis=0).series2 = companies3["price"]companies3.add(series2, axis=0)
df1 = pd.DataFrame( np.arange(9.).reshape((3, 3)), columns=list("bcd"), index = ["Ohio", "Texas", "Colorado"])df2 = pd.DataFrame( np.arange(12.).reshape((4, 3)), columns=list("bde"), index = ["Utah", "Ohio", "Texas", "Oregon"])df1df2df1 + df2
DataFrame.T
transposes DataFrame.companies3.T
pandas
NumPy functions on DataFrame
DataFrame.apply(np.function, axis)
applies a NumPy function
on the DataFrame
axis.companies3.apply(np.mean)companies3.apply(np.sqrt)companies3.apply(np.sqrt)[ :2]
pandas
Import/Export data
pd.read_csv("PATH_NAME_OF_*.csv")
reads the csv file into DataFrame
.
header=None
does not read the top row of the csv file as column names.names
, for example, names=["a", "b", "c", "d", "e"]
.DataFrame.head()
and DataFrame.tail()
prints the first and last five rows on the Console, respectively.nbc_show = pd.read_csv("https://bcdanl.github.io/data/nbc_show_na.csv")# `GRP`: audience size; `PE`: audience engagement.nbc_show.head() # showing the first five rowsnbc_show.tail() # showing the last five rows
pandas
Export data
DataFrame.to_csv("filename")
writes DataFrame
to the csv file.
index = False
and header=False
do not write row index and column names in the csv file.header
, for example, header=["a", "b", "c", "d", "e"]
.nbc_show.to_csv("PATH_NAME_OF_THE_csv_FILE")
pandas
Summarizing DataFrame
DataFrame.count()
returns a Series containing the number of non-missing values for each column.DataFrame.sum()
returns a Series containing the sum of values for each column.DataFrame.mean()
returns a Series containing the mean of values for each column.axis="columns"
or axis=1
sums across the columns instead:nbc_count = nbc_show.sum()nbc_sum = nbc_show.sum()nbc_sum_c = nbc_show.sum( axis="columns" )nbc_mean = nbc_show.mean()
pandas
Grouping DataFrame
DataFrame.groupby(col1, col2)
groups DataFrame
by columns (grouping by one or more than two columns is also possible!).
count()
, sum()
, mean()
to groupby()
returns the sum or the mean of the grouped columns.nbc_genre_count = nbc_show.groupby(["Genre"]).count()nbc_genre_sum = nbc_show.groupby(["Genre"]).sum()nbc_network_genre_mean = nbc_show.groupby(["Network", "Genre"]).mean()
pandas
Sorting DataFrame
DataFrame.sort_index()
sorts DataFrame by index on either axis.
DataFrame.sort_index(axis="columns")
sorts DataFrame by column index.
DataFrame.sort_index(ascending=False)
sorts DataFrame by either index in descending order.
nbc_show.sort_index()nbc_show.sort_index(ascending = False)nbc_show.sort_index(axis = "columns")nbc_show.sort_value()nbc_show.sort_value(ascending = False)nbc_show.sort_value(axis = "columns")
pandas
Sorting DataFrame
DataFrame.sort_value("SOME_VARIABLE")
sorts DataFrame by values of SOME_VARIABLE.
Series.sort_value()
, we do not need to provide "SOME_VARIABLE"
in the sort_value()
function.DataFrame.sort_value("SOME_VARIABLE", ascdening = False)
sorts DataFrame by values of SOME_VARIABLE in descending order.
nbc_show.sort_value("GRP")nbc_show.sort_value("GRP", ascending = False)obj = pd.Series([4, np.nan, 7, np.nan, -3, 2])obj.sort_values()
pandas
Class Exercise
Use the nbc_show_na.csv
file to answer the following questions:
Find the top show in terms of the value of PE
for each Genre.
Find the top show in terms of the value of GRP
for each Network.
Which genre does have the largest GRP
on average?
Installing Python modules
seaborn
.conda install seaborn
or
pip install seaborn
conda install seaborn
or
pip install seaborn
seaborn
Graphs and charts let us explore and learn about the structure of the information we have in DataFrame.
Good data visualizations make it easier to communicate our ideas and findings to other people.
We use visualization and summary statistics (e.g., mean, median, minimum, maximum) to explore our data in a systematic way.
EDA is an iterative cycle. We:
Generate questions about our data.
Search for answers by visualizing, transforming, and modelling our data.
Use what we learn to refine our questions and/or generate new questions.
seaborn
seaborn
is a Python data visualization library based on matplotlib
. matplotlib
-produced plots, and so I recommend using it by default.import seaborn as sns
seaborn
Types of plots
We will consider the following types of visualization:
Bar chart
Histogram
Scatter plot
Scatter plot with Fitted line
Line chart
pandas
What is tidy DataFrame
?
There are three rules which make a dataset tidy:
seaborn
Getting started with seaborn
DataFrame
s provided by the seaborn
library:import seaborn as snsprint( sns.get_dataset_names() )
titanic
and tips
DataFrames:df_titanic = sns.load_dataset('titanic')df_titanic.head()df_tips = sns.load_dataset('tips')df_tips.head()
seaborn
Bar Chart
sns.countplot()
function to plot a bar chart:sns.countplot(x = 'sex', data = df_titanic)
data
: DataFrame.x
: Name of a categorical variable (column) in DataFrameseaborn
Bar Chart
We can further break up the bars in the bar chart based on another categorical variable.
sns.countplot(x = 'sex', hue = 'survived', data = df_titanic)
hue
: Name of a categorical variableseaborn
Histogram
sns.displot()
function to plot a histogram:sns.displot(x = 'age', bins = 5 , data = df_titanic)
bins
: Number of binsseaborn
Scatter plot
A scatter plot is used to display the relationship between two continuous variables.
We use sns.scatterplot()
function to plot a scatter plot:
sns.scatterplot(x = 'total_bill', y = 'tip', data = df_tips)
x
: Name of a continuous variable on the horizontal axisy
: Name of a continuous variable on the vertical axisseaborn
Scatter plot
To the scatter plot, we can add a hue
-VARIABLE
mapping to display how the relationship between two continuous variables varies by VARIABLE
.
Suppose we are interested in the following question:
sns.scatterplot(x = 'total_bill', y = 'tip', hue = 'smoker', data = df)
seaborn
Fitted line
From the scatter plot, it is often difficult to clearly see the relationship between two continuous variables.
sns.lmplot()
adds a line that fits well into the scattered points.
On average, the fitted line describes the relationship between two continuous variables.
sns.lmplot(x = 'total_bill', y = 'tip', data = df_tips)
seaborn
Scatter plot
To the scatter plot, we can add a hue
-VARIABLE
mapping to display how the relationship between two continuous variables varies by VARIABLE
.
Using the fitted lines, let's answer the following question:
sns.scatterplot(x = 'total_bill', y = 'tip', hue = 'smoker', data = df_tips)
seaborn
Line cahrt
sns.lineplot()
function to plot a line plot:path_csv = '/Users/byeong-hakchoe/Google Drive/suny-geneseo/teaching-materials/lecture-data/dji.csv'dow = pd.read_csv(path_csv, index_col=0, parse_dates=True)sns.lineplot(x = 'Date', y = 'Close', data = dow)
x
: Name of a continuous variable (often time variable) on the horizontal axis y
: Name of a continuous variable on the vertical axisOffice Hours
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Office Hours
Google Data Analytics Certificate
Course programs for Google Data Analytics Certificate are hosted on Coursera.
Google Data Analytics Certificate
Google Data Analytics Certificate program uses R.
Why Python, R, and SQL?
Stack Overflow Trends
pandas
pandas
pandas
is a Python library including the following features:
pandas
provides high-performance data structures and data analysis tools.
import pandas as pd
pandas
pd.DataFrame
DataFrame
is the primary structure of pandas.
DataFrame
represents a table of data with an ordered collection of columns.
Each column can have a different data type.
DataFrame
can be thought of as a dictionary of Series
sharing the same index.
pandas
Create DataFrame
pd.DataFrame()
creates a DataFrame
which is a two-dimensional tabular-like structure with labeled axis (rows and columns).data = {"state": ["Ohio", "Ohio", "Ohio", "Nevada", "Nevada", "Nevada"], "year": [2000, 2001, 2002, 2001, 2002, 2003], "population": [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}frame = pd.DataFrame(data)
In this example the construction of the DataFrame
is done by passing a dictionary of equal-length lists.
It is also possible to pass a dictionary of NumPy arrays.
NaN
:frame2 = pd.DataFrame(data, columns=["state", "year","population", "income"])frame2
Series
.frame2 = pd.DataFrame(data, columns=["year", "state", "population"])frame2
We can pass the following types of objects to pd.DataFrame()
:
2D NumPy arrays
Dict of lists, tuples, dicts, arrays, or Series
List of lists, tuples, dicts, or Series
Another DataFrame
pandas
Indexing DataFrame
DataFrame
as follows:frame2["change"] = [1.2, -3.2, 0.4, -0.12, 2.4, 0.3]frame2["change"]
DataFrame
, a Series
is returned,frame2.change
, is also possible.Series
has the same index as the initial DataFrame
.frame2[ ["state", "population"] ]
index.name
and columns.name
respectively:frame2.index.name = "number:"frame2.columns.name = "variable:"frame2
DataFrames
, there is no default name for the index or the columns.DataFrame.reindex()
creates new DataFrame
with data conformed to a new index, while the initial DataFrame
will not be changed:frame3 = frame.reindex([0, 2, 3, 4])frame3
data = {"company": ["Daimler", "E.ON", "Siemens", "BASF", "BMW"],"price": [69.2, 8.11, 110.92, 87.28, 87.81],"volume": [4456290, 3667975, 3669487, 1778058, 1824582]}companies = pd.DataFrame(data)companiescompanies[2:]
Index values that are not already present will be filled with NaN
by
default.
The pd.isna()
and pd.notna()
functions detect missing data:
companies3 = companies.reindex(index = [0, 2, 3, 4, 5], columns=["company", "price", "market cap"])companies3pd.isna(companies3)pd.notna(companies3)
drop
with a sequence of labels will drop values from the row labels (axis 0):obj = pd.Series(np.arange(5.), index = ["a", "b", "c", "d", "e"])objnew_obj = obj.drop("c")new_objobj.drop(["d", "c"])
pandas
Dropping columns
DataFrame
, index values can be deleted from either axis. To illustrate this, we first create an example DataFrame
:data = pd.DataFrame(np.arange(16).reshape((4, 4)), index = ["Ohio", "Colorado", "Utah", "New York"], columns=["one", "two", "three", "four"])datadata.drop(index = ["Colorado", "Ohio"])
columns
keyword:data.drop(columns=["two"])
axis=1
or axis="columns"
:data.drop("two", axis=1)data.drop(["two", "four"], axis="columns")
del DataFrame[column]
deletes column from DataFrame
.del data["two"]data
pandas
Indexing, selecting and filtering
np.array
.data = {"company": ["Daimler", "E.ON", "Siemens", "BASF", "BMW"],"price": [69.2, 8.11, 110.92, 87.28, 87.81],"volume": [4456290, 3667975, 3669487, 1778058, 1824582]}companies = pd.DataFrame(data)companiescompanies[2:]
companies2 = pd.DataFrame(data, index = ["a", "b", "c", "d", "e"])companies2companies2["b":"d"]
DataFrame.loc()
selects a subset of rows and columns from a DataFrame using axis labels.
DataFrame.iloc()
selects a subset of rows and columns from a
DataFrame using integers.
companies2.loc[ "c", ["company", "price"] ]companies2.iloc[ 2, [0, 1] ]companies2.loc[ ["c", "d", "e"], ["volume", "price", "company"] ]companies2.iloc[ 2:, : :-1 ]
df[val]
selects single column or set of columns;df.loc[val]
selects single row or set of rows;df.loc[:, val]
selects single column or set of columns;df.loc[val1, val2]
selects row and column by label;df.iloc[where]
selects row or set of rows by integer position;df.iloc[:, where]
selects column or set of columns by integer position;df.iloc[w1, w2]
Select row and column by integer position.pandas
Operations between DataFrame
s and Series
series
is generated from the first row of the DataFrame
:companies3 = companies[["price", "volume"]]companies3.index = ["Daimler", "E.ON", "Siemens", "BASF", "BMW"]series = companies3.iloc[2]companies3series
DataFrames
and Series
match the index of the Series
on the DataFrame
's columns:companies3 + series
DataFrame.add()
does addition along a column matching the DataFrame
's row index (axis=0).series2 = companies3["price"]companies3.add(series2, axis=0)
df1 = pd.DataFrame( np.arange(9.).reshape((3, 3)), columns=list("bcd"), index = ["Ohio", "Texas", "Colorado"])df2 = pd.DataFrame( np.arange(12.).reshape((4, 3)), columns=list("bde"), index = ["Utah", "Ohio", "Texas", "Oregon"])df1df2df1 + df2
DataFrame.T
transposes DataFrame.companies3.T
pandas
NumPy functions on DataFrame
DataFrame.apply(np.function, axis)
applies a NumPy function
on the DataFrame
axis.companies3.apply(np.mean)companies3.apply(np.sqrt)companies3.apply(np.sqrt)[ :2]
pandas
Import/Export data
pd.read_csv("PATH_NAME_OF_*.csv")
reads the csv file into DataFrame
.
header=None
does not read the top row of the csv file as column names.names
, for example, names=["a", "b", "c", "d", "e"]
.DataFrame.head()
and DataFrame.tail()
prints the first and last five rows on the Console, respectively.nbc_show = pd.read_csv("https://bcdanl.github.io/data/nbc_show_na.csv")# `GRP`: audience size; `PE`: audience engagement.nbc_show.head() # showing the first five rowsnbc_show.tail() # showing the last five rows
pandas
Export data
DataFrame.to_csv("filename")
writes DataFrame
to the csv file.
index = False
and header=False
do not write row index and column names in the csv file.header
, for example, header=["a", "b", "c", "d", "e"]
.nbc_show.to_csv("PATH_NAME_OF_THE_csv_FILE")
pandas
Summarizing DataFrame
DataFrame.count()
returns a Series containing the number of non-missing values for each column.DataFrame.sum()
returns a Series containing the sum of values for each column.DataFrame.mean()
returns a Series containing the mean of values for each column.axis="columns"
or axis=1
sums across the columns instead:nbc_count = nbc_show.sum()nbc_sum = nbc_show.sum()nbc_sum_c = nbc_show.sum( axis="columns" )nbc_mean = nbc_show.mean()
pandas
Grouping DataFrame
DataFrame.groupby(col1, col2)
groups DataFrame
by columns (grouping by one or more than two columns is also possible!).
count()
, sum()
, mean()
to groupby()
returns the sum or the mean of the grouped columns.nbc_genre_count = nbc_show.groupby(["Genre"]).count()nbc_genre_sum = nbc_show.groupby(["Genre"]).sum()nbc_network_genre_mean = nbc_show.groupby(["Network", "Genre"]).mean()
pandas
Sorting DataFrame
DataFrame.sort_index()
sorts DataFrame by index on either axis.
DataFrame.sort_index(axis="columns")
sorts DataFrame by column index.
DataFrame.sort_index(ascending=False)
sorts DataFrame by either index in descending order.
nbc_show.sort_index()nbc_show.sort_index(ascending = False)nbc_show.sort_index(axis = "columns")nbc_show.sort_value()nbc_show.sort_value(ascending = False)nbc_show.sort_value(axis = "columns")
pandas
Sorting DataFrame
DataFrame.sort_value("SOME_VARIABLE")
sorts DataFrame by values of SOME_VARIABLE.
Series.sort_value()
, we do not need to provide "SOME_VARIABLE"
in the sort_value()
function.DataFrame.sort_value("SOME_VARIABLE", ascdening = False)
sorts DataFrame by values of SOME_VARIABLE in descending order.
nbc_show.sort_value("GRP")nbc_show.sort_value("GRP", ascending = False)obj = pd.Series([4, np.nan, 7, np.nan, -3, 2])obj.sort_values()
pandas
Class Exercise
Use the nbc_show_na.csv
file to answer the following questions:
Find the top show in terms of the value of PE
for each Genre.
Find the top show in terms of the value of GRP
for each Network.
Which genre does have the largest GRP
on average?
Installing Python modules
seaborn
.conda install seaborn
or
pip install seaborn
conda install seaborn
or
pip install seaborn
seaborn
Graphs and charts let us explore and learn about the structure of the information we have in DataFrame.
Good data visualizations make it easier to communicate our ideas and findings to other people.
We use visualization and summary statistics (e.g., mean, median, minimum, maximum) to explore our data in a systematic way.
EDA is an iterative cycle. We:
Generate questions about our data.
Search for answers by visualizing, transforming, and modelling our data.
Use what we learn to refine our questions and/or generate new questions.
seaborn
seaborn
is a Python data visualization library based on matplotlib
. matplotlib
-produced plots, and so I recommend using it by default.import seaborn as sns
seaborn
Types of plots
We will consider the following types of visualization:
Bar chart
Histogram
Scatter plot
Scatter plot with Fitted line
Line chart
pandas
What is tidy DataFrame
?
There are three rules which make a dataset tidy:
seaborn
Getting started with seaborn
DataFrame
s provided by the seaborn
library:import seaborn as snsprint( sns.get_dataset_names() )
titanic
and tips
DataFrames:df_titanic = sns.load_dataset('titanic')df_titanic.head()df_tips = sns.load_dataset('tips')df_tips.head()
seaborn
Bar Chart
sns.countplot()
function to plot a bar chart:sns.countplot(x = 'sex', data = df_titanic)
data
: DataFrame.x
: Name of a categorical variable (column) in DataFrameseaborn
Bar Chart
We can further break up the bars in the bar chart based on another categorical variable.
sns.countplot(x = 'sex', hue = 'survived', data = df_titanic)
hue
: Name of a categorical variableseaborn
Histogram
sns.displot()
function to plot a histogram:sns.displot(x = 'age', bins = 5 , data = df_titanic)
bins
: Number of binsseaborn
Scatter plot
A scatter plot is used to display the relationship between two continuous variables.
We use sns.scatterplot()
function to plot a scatter plot:
sns.scatterplot(x = 'total_bill', y = 'tip', data = df_tips)
x
: Name of a continuous variable on the horizontal axisy
: Name of a continuous variable on the vertical axisseaborn
Scatter plot
To the scatter plot, we can add a hue
-VARIABLE
mapping to display how the relationship between two continuous variables varies by VARIABLE
.
Suppose we are interested in the following question:
sns.scatterplot(x = 'total_bill', y = 'tip', hue = 'smoker', data = df)
seaborn
Fitted line
From the scatter plot, it is often difficult to clearly see the relationship between two continuous variables.
sns.lmplot()
adds a line that fits well into the scattered points.
On average, the fitted line describes the relationship between two continuous variables.
sns.lmplot(x = 'total_bill', y = 'tip', data = df_tips)
seaborn
Scatter plot
To the scatter plot, we can add a hue
-VARIABLE
mapping to display how the relationship between two continuous variables varies by VARIABLE
.
Using the fitted lines, let's answer the following question:
sns.scatterplot(x = 'total_bill', y = 'tip', hue = 'smoker', data = df_tips)
seaborn
Line cahrt
sns.lineplot()
function to plot a line plot:path_csv = '/Users/byeong-hakchoe/Google Drive/suny-geneseo/teaching-materials/lecture-data/dji.csv'dow = pd.read_csv(path_csv, index_col=0, parse_dates=True)sns.lineplot(x = 'Date', y = 'Close', data = dow)
x
: Name of a continuous variable (often time variable) on the horizontal axis y
: Name of a continuous variable on the vertical axis