pandaspandas
pandas is a Python library including the following features:
pandas provides high-performance data structures and data analysis tools.
import pandas as pdpandas Create Series
pd.Series() creates one-dimensional array-like object including values
and an index.obj = pd.Series([4, 7, -5, 3])obj
Series formed only from a list.pandas Create Series
arrays can only be indexed by integers, while Series can be indexed by the manually set index.obj2 = pd.Series([2, -5, 9, 4], index=["a", "b", "c", "d"])npobj = np.array([2, -5, 9, 4])obj2obj2["b"]npobj[1]pandas Create Series
Series.values returns the values of a Series.Series.index returns the index of a Series.obj.valuesobj.indexobj2.index
RangeIndex.pandas Create Series
Series can be created from NumPy arrays.npobj = np.array([2, -5, 9, 4])obj2 = pd.Series(npobj, index=["a", "b", "c", "d"])obj2
obj2.indexobj2["a"]obj2["d"] = 6obj2[ ["c", "a", "d"] ]
["c", "a", "d"] is interpreted as a list of indices.Using NumPy functions or NumPy-like operations will preserve the index-value link.
Another way to think about a Series is as a fixed-length, ordered dictionary, as it is a mapping of index values to data values.
obj2[obj2 > 0]obj2 * 2np.exp(obj2)
"b" in obj2"e" in obj2
pandas Create Series
Series can be created from dictionaries as well.Series consists of the dict’s keys.Seriesdictdata = {"Rochester": 210_606, "Buffalo": 276_807, "Syracuse": 146_103}obj3 = pd.Series(dictdata)obj3
cities = ["Niagara", "Buffalo", "Syracuse"]obj4 = pd.Series(dictdata, index=cities)obj4
NaN (not a number) marks missing values where the index and the dict do not match.pandas Series properties
Series.name returns name of the Series.Series.index.name returns name of the Series's index.obj4.name = "population"obj4.index.name = "cities"obj4
name will change the name of the existing Series.Series or the index.pandas pd.Series vs. np.array
isna and notna functions are used to detect missing data:pd.isna(obj4)pd.notna(obj4)
obj4.isna()obj4.notna()
pandas pd.DataFrame
DataFrame is the primary structure of pandas.
DataFrame represents a table of data with an ordered collection of columns.
Each column can have a different data type.
DataFrame can be thought of as a dictionary of Series sharing the same index.
pandas Create DataFrame
pd.DataFrame() creates a DataFrame which is a two-dimensional tabular-like structure with labeled axis (rows and columns).data = {"state": ["Ohio", "Ohio", "Ohio", "Nevada", "Nevada", "Nevada"], "year": [2000, 2001, 2002, 2001, 2002, 2003], "population": [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}frame = pd.DataFrame(data)In this example the construction of the DataFrame is done by passing a dictionary of equal-length lists.
It is also possible to pass a dictionary of NumPy arrays.
NaN:frame2 = pd.DataFrame(data, columns=["state", "year","population", "income"])frame2Series.frame2 = pd.DataFrame(data, columns=["year", "state", "population"])frame2We can pass the following types of objects to pd.DataFrame():
2D NumPy arrays
Dict of lists, tuples, dicts, arrays, or Series
List of lists, tuples, dicts, or Series
Another DataFrame
pandas Indexing DataFrame
DataFrame as follows:frame2["change"] = [1.2, -3.2, 0.4, -0.12, 2.4, 0.3]frame2["change"]DataFrame, a Series is returned,frame2.change, is also possible.Series has the same index as the initial DataFrame.frame2[ ["state", "population"] ]index.name and columns.name respectively:frame2.index.name = "number:"frame2.columns.name = "variable:"frame2DataFrames, there is no default name for the index or the columns.DataFrame.reindex() creates new DataFrame with data conformed to a new index, while the initial DataFrame will not be changed:frame3 = frame.reindex([0, 2, 3, 4])frame3data = {"company": ["Daimler", "E.ON", "Siemens", "BASF", "BMW"],"price": [69.2, 8.11, 110.92, 87.28, 87.81],"volume": [4456290, 3667975, 3669487, 1778058, 1824582]}companies = pd.DataFrame(data)companiescompanies[2:]Index values that are not already present will be filled with NaN by
default.
The pd.isna() and pd.notna() functions detect missing data:
companies3 = companies.reindex(index=[0, 2, 3, 4, 5], columns=["company", "price", "market cap"])companies3pd.isna(companies3)pd.notna(companies3)drop with a sequence of labels will drop values from the row labels (axis 0):obj = pd.Series(np.arange(5.), index=["a", "b", "c", "d", "e"])objnew_obj = obj.drop("c")new_objobj.drop(["d", "c"])pandasDropping columns
DataFrame, index values can be deleted from either axis. To illustrate this, we first create an example DataFrame:data = pd.DataFrame(np.arange(16).reshape((4, 4)), index=["Ohio", "Colorado", "Utah", "New York"], columns=["one", "two", "three", "four"])datadata.drop(index=["Colorado", "Ohio"])columns keyword:data.drop(columns=["two"])axis=1 or axis="columns":data.drop("two", axis=1)data.drop(["two", "four"], axis="columns")del DataFrame[column] deletes column from DataFrame.del data["two"]datapandasIndexing, selecting and filtering
np.array.data = {"company": ["Daimler", "E.ON", "Siemens", "BASF", "BMW"],"price": [69.2, 8.11, 110.92, 87.28, 87.81],"volume": [4456290, 3667975, 3669487, 1778058, 1824582]}companies = pd.DataFrame(data)companiescompanies[2:]companies2 = pd.DataFrame(data, index=["a", "b", "c", "d", "e"])companies2companies2["b":"d"]DataFrame.loc() selects a subset of rows and columns from a DataFrame using axis labels.
DataFrame.iloc() selects a subset of rows and columns from a
DataFrame using integers.
companies2.loc[ "c", ["company", "price"] ]companies2.iloc[ 2, [0, 1] ]companies2.loc[ ["c", "d", "e"], ["volume", "price", "company"] ]companies2.iloc[ 2:, : :-1 ]df[val] selects single column or set of columns;df.loc[val] selects single row or set of rows;df.loc[:, val] selects single column or set of columns;df.loc[val1, val2] selects row and column by label;df.iloc[where] selects row or set of rows by integer position;df.iloc[:, where] selects column or set of columns by integer position;df.iloc[w1, w2] Select row and column by integer position.pandas Operations between DataFrames and Series
series is generated from the first row of the DataFrame:companies3 = companies[["price", "volume"]]companies3.index = ["Daimler", "E.ON", "Siemens", "BASF", "BMW"]series = companies3.iloc[2]companies3seriesDataFrames and Series match the index of the Series on the DataFrame's columns:companies3 + seriesDataFrame.add() does addition along a column matching the DataFrame's row index (axis=0).series2 = companies3["price"]companies3.add(series2, axis=0)df1 = pd.DataFrame( np.arange(9.).reshape((3, 3)), columns=list("bcd"), index=["Ohio", "Texas", "Colorado"])df2 = pd.DataFrame( np.arange(12.).reshape((4, 3)), columns=list("bde"), index=["Utah", "Ohio", "Texas", "Oregon"])df1df2df1 + df2DataFrame.T transposes DataFrame.companies3.Tpandas NumPy functions on DataFrame
DataFrame.apply(np.function, axis) applies a NumPy function
on the DataFrame axis.companies3.apply(np.mean)companies3.apply(np.sqrt)companies3.apply(np.sqrt)[ :2]pandasImport/Export data
pd.read_csv("PATH_NAME_OF_*.csv") reads the csv file into DataFrame.
header=None does not read the top row of the csv file as column names.names, for example, names=["a", "b", "c", "d", "e"].DataFrame.head() and DataFrame.tail() prints the first and last five rows on the Console, respectively.nbc_show = pd.read_csv("https://bcdanl.github.io/data/nbc_show_na.csv")# `GRP`: audience size; `PE`: audience engagement.nbc_show.head() # showing the first five rowsnbc_show.tail() # showing the last five rowspandasExport data
DataFrame.to_csv("filename") writes DataFrame to the csv file.
index=False and header=False do not write row index and column names in the csv file.header, for example, header=["a", "b", "c", "d", "e"].nbc_show.to_csv("PATH_NAME_OF_THE_csv_FILE")pandas Summarizing DataFrame
DataFrame.count() returns a Series containing the number of non-missing values for each column.DataFrame.sum() returns a Series containing the sum of values for each column.DataFrame.mean() returns a Series containing the mean of values for each column.axis="columns" or axis=1 sums across the columns instead:nbc_count = nbc_show.sum()nbc_sum = nbc_show.sum()nbc_sum_c = nbc_show.sum( axis="columns" )nbc_mean = nbc_show.mean()pandas Grouping DataFrame
DataFrame.groupby(col1, col2) groups DataFrame by columns (grouping by one or more than two columns is also possible!).
count(), sum(), mean() to groupby() returns the sum or the mean of the grouped columns.nbc_genre_count = nbc_show.groupby(["Genre"]).count()nbc_genre_sum = nbc_show.groupby(["Genre"]).sum()nbc_network_genre_mean = nbc_show.groupby(["Network", "Genre"]).mean()pandas Sorting DataFrame
DataFrame.sort_index() sorts DataFrame by index on either axis.
DataFrame.sort_index(axis="columns") sorts DataFrame by column index.
DataFrame.sort_index(ascending=False) sorts DataFrame by either index in descending order.
nbc_show.sort_index()nbc_show.sort_index(ascending = False)nbc_show.sort_index(axis = "columns")nbc_show.sort_value()nbc_show.sort_value(ascending = False)nbc_show.sort_value(axis = "columns")pandas Sorting DataFrame
DataFrame.sort_value("SOME_VARIABLE") sorts DataFrame by values of SOME_VARIABLE.
Series.sort_value(), we do not need to provide "SOME_VARIABLE" in the sort_value() function.DataFrame.sort_value("SOME_VARIABLE", ascdening = False) sorts DataFrame by values of SOME_VARIABLE in descending order.
nbc_show.sort_value("GRP")nbc_show.sort_value("GRP", ascending = False)obj = pd.Series([4, np.nan, 7, np.nan, -3, 2])obj.sort_values()pandasClass Exercise
Use the nbc_show_na.csv file to answer the following questions:
Find the top show in terms of the value of PE for each Genre.
Find the top show in terms of the value of GRP for each Network.
Which genre does have the largest GRP on average?
seabornseaborn
seaborn is a Python data visualization library based on matplotlib. matplotlib-produced plots, and so I recommend using it by default.import seaborn as snsWe use visualization and summary statistics (e.g., mean, standard deviation, minimum, maximum, median) to explore our data in a systematic way.
EDA is an iterative cycle. We:
Generate questions about our data.
Search for answers by visualizing, transforming, and modelling our data.
Use what we learn to refine our questions and/or generate new questions.
seabornTypes of plots
We will consider the following types of visualization:
Bar chart
Histogram
Scatter plot
Line chart
pandas What is tidy DataFrame?
There are three rules which make a dataset tidy:

seaborn Getting started with seaborn
DataFrames provided by the seaborn library:import seaborn as snsprint( sns.get_dataset_names() )
titanic DataFrame:df = sns.load_dataset('titanic')df.head()seabornBar Chart
sns.countplot() function to plot a bar chart:sns.countplot(x = 'sex', data = df)
data: DataFrame.x: Name of a categorical variable (column) in DataFrameseabornBar Chart
We can further break up the bars in the bar chart based on another categorical variable.
sns.countplot(x='sex', hue = 'survived', data = df)
hue: Name of a categorical variableseabornHistogram
sns.displot() function to plot a histogram:sns.displot(x = 'age', bins = 5 , data = df)
bins: Number of binsseabornScatter plot
A scatter plot is used to display the relationship between the two continuous variables.
We use sns.scatterplot() function to plot a scatter plot:
df = sns.load_dataset('tips')sns.scatterplot(x='total_bill', y ='tip', data = df)
x: Name of a continuous variable on the horizontal axisy: Name of a continuous variable on the vertical axisseabornLine cahrt
sns.lineplot() function to plot a line plot:path_csv = '/Users/byeong-hakchoe/Google Drive/suny-geneseo/teaching-materials/lecture-data/dji.csv'dow = pd.read_csv(path_csv, index_col=0, parse_dates=True)sns.lineplot(x = 'Date', y = 'Close', data = dow)
x: Name of a continuous variable (often time variable) on the horizontal axis y: Name of a continuous variable on the vertical axispandasKeyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| o | Tile View: Overview of Slides |
| Esc | Back to slideshow |
pandaspandas
pandas is a Python library including the following features:
pandas provides high-performance data structures and data analysis tools.
import pandas as pdpandas Create Series
pd.Series() creates one-dimensional array-like object including values
and an index.obj = pd.Series([4, 7, -5, 3])obj
Series formed only from a list.pandas Create Series
arrays can only be indexed by integers, while Series can be indexed by the manually set index.obj2 = pd.Series([2, -5, 9, 4], index=["a", "b", "c", "d"])npobj = np.array([2, -5, 9, 4])obj2obj2["b"]npobj[1]pandas Create Series
Series.values returns the values of a Series.Series.index returns the index of a Series.obj.valuesobj.indexobj2.index
RangeIndex.pandas Create Series
Series can be created from NumPy arrays.npobj = np.array([2, -5, 9, 4])obj2 = pd.Series(npobj, index=["a", "b", "c", "d"])obj2
obj2.indexobj2["a"]obj2["d"] = 6obj2[ ["c", "a", "d"] ]
["c", "a", "d"] is interpreted as a list of indices.Using NumPy functions or NumPy-like operations will preserve the index-value link.
Another way to think about a Series is as a fixed-length, ordered dictionary, as it is a mapping of index values to data values.
obj2[obj2 > 0]obj2 * 2np.exp(obj2)
"b" in obj2"e" in obj2
pandas Create Series
Series can be created from dictionaries as well.Series consists of the dict’s keys.Seriesdictdata = {"Rochester": 210_606, "Buffalo": 276_807, "Syracuse": 146_103}obj3 = pd.Series(dictdata)obj3
cities = ["Niagara", "Buffalo", "Syracuse"]obj4 = pd.Series(dictdata, index=cities)obj4
NaN (not a number) marks missing values where the index and the dict do not match.pandas Series properties
Series.name returns name of the Series.Series.index.name returns name of the Series's index.obj4.name = "population"obj4.index.name = "cities"obj4
name will change the name of the existing Series.Series or the index.pandas pd.Series vs. np.array
isna and notna functions are used to detect missing data:pd.isna(obj4)pd.notna(obj4)
obj4.isna()obj4.notna()
pandas pd.DataFrame
DataFrame is the primary structure of pandas.
DataFrame represents a table of data with an ordered collection of columns.
Each column can have a different data type.
DataFrame can be thought of as a dictionary of Series sharing the same index.
pandas Create DataFrame
pd.DataFrame() creates a DataFrame which is a two-dimensional tabular-like structure with labeled axis (rows and columns).data = {"state": ["Ohio", "Ohio", "Ohio", "Nevada", "Nevada", "Nevada"], "year": [2000, 2001, 2002, 2001, 2002, 2003], "population": [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}frame = pd.DataFrame(data)In this example the construction of the DataFrame is done by passing a dictionary of equal-length lists.
It is also possible to pass a dictionary of NumPy arrays.
NaN:frame2 = pd.DataFrame(data, columns=["state", "year","population", "income"])frame2Series.frame2 = pd.DataFrame(data, columns=["year", "state", "population"])frame2We can pass the following types of objects to pd.DataFrame():
2D NumPy arrays
Dict of lists, tuples, dicts, arrays, or Series
List of lists, tuples, dicts, or Series
Another DataFrame
pandas Indexing DataFrame
DataFrame as follows:frame2["change"] = [1.2, -3.2, 0.4, -0.12, 2.4, 0.3]frame2["change"]DataFrame, a Series is returned,frame2.change, is also possible.Series has the same index as the initial DataFrame.frame2[ ["state", "population"] ]index.name and columns.name respectively:frame2.index.name = "number:"frame2.columns.name = "variable:"frame2DataFrames, there is no default name for the index or the columns.DataFrame.reindex() creates new DataFrame with data conformed to a new index, while the initial DataFrame will not be changed:frame3 = frame.reindex([0, 2, 3, 4])frame3data = {"company": ["Daimler", "E.ON", "Siemens", "BASF", "BMW"],"price": [69.2, 8.11, 110.92, 87.28, 87.81],"volume": [4456290, 3667975, 3669487, 1778058, 1824582]}companies = pd.DataFrame(data)companiescompanies[2:]Index values that are not already present will be filled with NaN by
default.
The pd.isna() and pd.notna() functions detect missing data:
companies3 = companies.reindex(index=[0, 2, 3, 4, 5], columns=["company", "price", "market cap"])companies3pd.isna(companies3)pd.notna(companies3)drop with a sequence of labels will drop values from the row labels (axis 0):obj = pd.Series(np.arange(5.), index=["a", "b", "c", "d", "e"])objnew_obj = obj.drop("c")new_objobj.drop(["d", "c"])pandasDropping columns
DataFrame, index values can be deleted from either axis. To illustrate this, we first create an example DataFrame:data = pd.DataFrame(np.arange(16).reshape((4, 4)), index=["Ohio", "Colorado", "Utah", "New York"], columns=["one", "two", "three", "four"])datadata.drop(index=["Colorado", "Ohio"])columns keyword:data.drop(columns=["two"])axis=1 or axis="columns":data.drop("two", axis=1)data.drop(["two", "four"], axis="columns")del DataFrame[column] deletes column from DataFrame.del data["two"]datapandasIndexing, selecting and filtering
np.array.data = {"company": ["Daimler", "E.ON", "Siemens", "BASF", "BMW"],"price": [69.2, 8.11, 110.92, 87.28, 87.81],"volume": [4456290, 3667975, 3669487, 1778058, 1824582]}companies = pd.DataFrame(data)companiescompanies[2:]companies2 = pd.DataFrame(data, index=["a", "b", "c", "d", "e"])companies2companies2["b":"d"]DataFrame.loc() selects a subset of rows and columns from a DataFrame using axis labels.
DataFrame.iloc() selects a subset of rows and columns from a
DataFrame using integers.
companies2.loc[ "c", ["company", "price"] ]companies2.iloc[ 2, [0, 1] ]companies2.loc[ ["c", "d", "e"], ["volume", "price", "company"] ]companies2.iloc[ 2:, : :-1 ]df[val] selects single column or set of columns;df.loc[val] selects single row or set of rows;df.loc[:, val] selects single column or set of columns;df.loc[val1, val2] selects row and column by label;df.iloc[where] selects row or set of rows by integer position;df.iloc[:, where] selects column or set of columns by integer position;df.iloc[w1, w2] Select row and column by integer position.pandas Operations between DataFrames and Series
series is generated from the first row of the DataFrame:companies3 = companies[["price", "volume"]]companies3.index = ["Daimler", "E.ON", "Siemens", "BASF", "BMW"]series = companies3.iloc[2]companies3seriesDataFrames and Series match the index of the Series on the DataFrame's columns:companies3 + seriesDataFrame.add() does addition along a column matching the DataFrame's row index (axis=0).series2 = companies3["price"]companies3.add(series2, axis=0)df1 = pd.DataFrame( np.arange(9.).reshape((3, 3)), columns=list("bcd"), index=["Ohio", "Texas", "Colorado"])df2 = pd.DataFrame( np.arange(12.).reshape((4, 3)), columns=list("bde"), index=["Utah", "Ohio", "Texas", "Oregon"])df1df2df1 + df2DataFrame.T transposes DataFrame.companies3.Tpandas NumPy functions on DataFrame
DataFrame.apply(np.function, axis) applies a NumPy function
on the DataFrame axis.companies3.apply(np.mean)companies3.apply(np.sqrt)companies3.apply(np.sqrt)[ :2]pandasImport/Export data
pd.read_csv("PATH_NAME_OF_*.csv") reads the csv file into DataFrame.
header=None does not read the top row of the csv file as column names.names, for example, names=["a", "b", "c", "d", "e"].DataFrame.head() and DataFrame.tail() prints the first and last five rows on the Console, respectively.nbc_show = pd.read_csv("https://bcdanl.github.io/data/nbc_show_na.csv")# `GRP`: audience size; `PE`: audience engagement.nbc_show.head() # showing the first five rowsnbc_show.tail() # showing the last five rowspandasExport data
DataFrame.to_csv("filename") writes DataFrame to the csv file.
index=False and header=False do not write row index and column names in the csv file.header, for example, header=["a", "b", "c", "d", "e"].nbc_show.to_csv("PATH_NAME_OF_THE_csv_FILE")pandas Summarizing DataFrame
DataFrame.count() returns a Series containing the number of non-missing values for each column.DataFrame.sum() returns a Series containing the sum of values for each column.DataFrame.mean() returns a Series containing the mean of values for each column.axis="columns" or axis=1 sums across the columns instead:nbc_count = nbc_show.sum()nbc_sum = nbc_show.sum()nbc_sum_c = nbc_show.sum( axis="columns" )nbc_mean = nbc_show.mean()pandas Grouping DataFrame
DataFrame.groupby(col1, col2) groups DataFrame by columns (grouping by one or more than two columns is also possible!).
count(), sum(), mean() to groupby() returns the sum or the mean of the grouped columns.nbc_genre_count = nbc_show.groupby(["Genre"]).count()nbc_genre_sum = nbc_show.groupby(["Genre"]).sum()nbc_network_genre_mean = nbc_show.groupby(["Network", "Genre"]).mean()pandas Sorting DataFrame
DataFrame.sort_index() sorts DataFrame by index on either axis.
DataFrame.sort_index(axis="columns") sorts DataFrame by column index.
DataFrame.sort_index(ascending=False) sorts DataFrame by either index in descending order.
nbc_show.sort_index()nbc_show.sort_index(ascending = False)nbc_show.sort_index(axis = "columns")nbc_show.sort_value()nbc_show.sort_value(ascending = False)nbc_show.sort_value(axis = "columns")pandas Sorting DataFrame
DataFrame.sort_value("SOME_VARIABLE") sorts DataFrame by values of SOME_VARIABLE.
Series.sort_value(), we do not need to provide "SOME_VARIABLE" in the sort_value() function.DataFrame.sort_value("SOME_VARIABLE", ascdening = False) sorts DataFrame by values of SOME_VARIABLE in descending order.
nbc_show.sort_value("GRP")nbc_show.sort_value("GRP", ascending = False)obj = pd.Series([4, np.nan, 7, np.nan, -3, 2])obj.sort_values()pandasClass Exercise
Use the nbc_show_na.csv file to answer the following questions:
Find the top show in terms of the value of PE for each Genre.
Find the top show in terms of the value of GRP for each Network.
Which genre does have the largest GRP on average?
seabornseaborn
seaborn is a Python data visualization library based on matplotlib. matplotlib-produced plots, and so I recommend using it by default.import seaborn as snsWe use visualization and summary statistics (e.g., mean, standard deviation, minimum, maximum, median) to explore our data in a systematic way.
EDA is an iterative cycle. We:
Generate questions about our data.
Search for answers by visualizing, transforming, and modelling our data.
Use what we learn to refine our questions and/or generate new questions.
seabornTypes of plots
We will consider the following types of visualization:
Bar chart
Histogram
Scatter plot
Line chart
pandas What is tidy DataFrame?
There are three rules which make a dataset tidy:

seaborn Getting started with seaborn
DataFrames provided by the seaborn library:import seaborn as snsprint( sns.get_dataset_names() )
titanic DataFrame:df = sns.load_dataset('titanic')df.head()seabornBar Chart
sns.countplot() function to plot a bar chart:sns.countplot(x = 'sex', data = df)
data: DataFrame.x: Name of a categorical variable (column) in DataFrameseabornBar Chart
We can further break up the bars in the bar chart based on another categorical variable.
sns.countplot(x='sex', hue = 'survived', data = df)
hue: Name of a categorical variableseabornHistogram
sns.displot() function to plot a histogram:sns.displot(x = 'age', bins = 5 , data = df)
bins: Number of binsseabornScatter plot
A scatter plot is used to display the relationship between the two continuous variables.
We use sns.scatterplot() function to plot a scatter plot:
df = sns.load_dataset('tips')sns.scatterplot(x='total_bill', y ='tip', data = df)
x: Name of a continuous variable on the horizontal axisy: Name of a continuous variable on the vertical axisseabornLine cahrt
sns.lineplot() function to plot a line plot:path_csv = '/Users/byeong-hakchoe/Google Drive/suny-geneseo/teaching-materials/lecture-data/dji.csv'dow = pd.read_csv(path_csv, index_col=0, parse_dates=True)sns.lineplot(x = 'Date', y = 'Close', data = dow)
x: Name of a continuous variable (often time variable) on the horizontal axis y: Name of a continuous variable on the vertical axis