+ - 0:00:00
Notes for current slide
Notes for next slide

Lecture 1


DANL 100: Programming for Data Analytics

Byeong-Hak Choe

August 30, 2022

1 / 47

Instructor


2 / 47

Instructor

Current Appointment & Education

  • Name: Byeong-Hak Choe.
  • Lecturer in Data Analytics at School of Business at SUNY Geneseo.

  • Ph.D. in Economics from University of Wyoming.

  • M.S. in Economics from Arizona State University.
  • M.A. in Economics from SUNY Stony Brook.
  • B.A. in Economics & B.S. in Applied Mathematics from Hanyang University at Ansan, South Korea
    • Minor in Business Administration.
    • Concentration in Finance.
3 / 47

Instructor

Data Science and Climate Change

  • Choe, B.H., 2021. "Social Media Campaigns, Lobbying and Legislation: Evidence from #climatechange/#globalwarming and Energy Lobbies."

  • Question: To what extent do social media campaigns compete with fossil fuel lobbying on climate change legislation?

  • Data include:

    • 5.0 million tweets with #climatechange/#globalwarming around the globe;
    • 12.0 million retweets/likes to those tweets;
    • 0.8 million Twitter users who wrote those tweets;
    • 1.4 million Twitter users who retweeted or liked those tweets;
    • 0.3 million US Twitter users with their location at a city level;
    • Firm-level lobbying data (expenses, targeted bills, etc.).
4 / 47

Syllabus


5 / 47

Syllabus

Email, Class & Office Hours

6 / 47

Syllabus

Required Textbooks

7 / 47

Syllabus

Required Textbooks

  • R for Data Science by Hadley Wickham & Garrett Grolemund (Henceforth, r4s).

    • A free HTML version of this book is available at https://r4ds.had.co.nz.
    • ISBN-13: 978-1491910399; ISBN-10: 1491910399
  • Strategic Analytics: The Insights You Need from Harvard Business Review} by Harvard Business Review, Eric Siegel, Edward L. Glaeser, Cassie Kozyrkov, and Thomas H. Davenport (Henceforth, HBR).

    • An eTextbook is available at Amazon.com.
    • ISBN-13: 978-1633698987; ISBN-10: 163369898X
8 / 47

Syllabus

Course Description

  • This course introduces the essential general programming concepts and techniques to a data analytics audience without prior programming experience.

  • Topics covered include

    1. introduction to Data Analytics thinking,
    2. data types such as numbers, characters, and Boolean (or logical values),
    3. control and programming structures such as loops, conditionals (e.g., if-else), and functions.
  • During the course, you will work hands-on with the Python and R programming languages and its associated data analysis libraries such as pandas and tidyverse.
9 / 47

Syllabus

Course Requirements

  • Laptop: You should bring your own laptop (Mac or Windows) to the classroom.

    • It is recommended to have 2+ core CPU, 4+ GB RAM, and 500+ GB disk storage in your laptop for this course.
  • Homework: There will be six homework assignments.

  • Exams: There will be midterm and final exams.

    • The final exam is comprehensive.
10 / 47

Syllabus

Course Contents

  • There will be tentatively 28 class sessions:
    • 27 lectures;
    • 1 midterm exam.
Weeks Python.Programming HBR HW
1 Setting up Python, R, & Excel Intro
2 py4e Ch.1-2 Ch.1 1
3 py4e Ch.3-4 Ch.2 1
4 py4e Ch.5-6 Ch.3 1
5 py4e Ch.7-8 Ch.4 2
6 py4e Ch.9-10 Ch.5 2
11 / 47

Syllabus

Course Contents

Weeks Python HBR HW
7 mckinney Ch.5 Ch.6 3
8 mckinney Ch.6 Ch.7 3
9 Midterm Exam
10 mckinney Ch.9 Ch.8 4
12 / 47

Syllabus

Course Contents

Weeks R HBR HW
11 Starting with R Ch.9 4
12 r4s Ch.3 Ch.10 5
13 r4s Ch.3 Ch.11 5
14 r4s Ch.5 Ch.12 6
15 r4s Ch.5 Ch.13 6
16 Final Exam
13 / 47

Syllabus

Grading

  • Homework assignments account for 33% of the total percentage grade.

  • Exams account for 67% of the total percentage grade.

(Total Percentage Grade)=0.33×(Total Homework Score)+0.67×(Total Exam Score).

14 / 47

Syllabus

Grading

  • The lowest homework score will be dropped when calculating the total homework score.

  • Each of the five homework accounts for 20% of the total homework score.

15 / 47

Syllabus

Grading

  • The total exam score is the maximum between the following two average scores:
    1. the simple average of two exam scores;
    2. the weighted average of them with one-fourth weight on the midterm exam score and three-fourth weight on the final exam score.

(Total Exam Score)=max{0.5×(Midterm Exam Score)+0.5×(Final Exam Score),0.25×(Midterm Exam Score)+0.75×(Final Exam Score)}.

16 / 47

Syllabus

Grading

  • Letter grades will be determined by the total percentage grade:

100A93>A90;90>B+87>B83>B80;80>C+77>C73>C70;70>D60>E.

17 / 47

Syllabus

Make-up exams

  • Make-up exams will not be given unless you have either a medically verified excuse or an absence excused by the University.

  • If you cannot take exams because of religious obligations, notify me by email at least two weeks in advance so that an alternative exam time may be set.

  • A missed exam without an excused absence earns a grade of zero.

18 / 47

Syllabus

Academic Integrity and Plagiarism

  • All homework assignments and exams must be the original work by you.

  • Examples of academic dishonesty include:

    • representing the work, thoughts, and ideas of another person as your own
    • allowing others to represent your work, thoughts, or ideas as theirs, and
    • being complicit in academic dishonesty by suspecting or knowing of it and not taking action.
19 / 47

Syllabus

Accessibility

  • The Office of Accessibility will coordinate reasonable accommodations for persons with physical, emotional, or cognitive disabilities to ensure equal access to academic programs, activities, and services at Geneseo.

  • Please contact me and the Office of Accessibility Services for questions related to access and accommodations.

20 / 47

Syllabus

Well-being

  • You are strongly encouraged to communicate your needs to faculty and staff and seek support if you are experiencing unmanageable stress or are having difficulties with daily functioning.

  • Liz Felski, the School of Business Student Advocate (felski@geneseo.edu, South Hall 303), or the Dean of Students (585-245-5706) can assist and provide direction to appropriate campus resources.

  • For more information, see https://www.geneseo.edu/dean_students.

21 / 47

Syllabus

Career Design

  • To get information about career development, you can visit the Career Development Events Calendar (https://www.geneseo.edu/career_development/events/calendar).

  • You can stop by South 112 to get assistance in completing your Handshake Profile https://app.joinhandshake.com/login.

    • Handshake is ranked #1 by students as the best place to find full-time jobs.
    • 50% of the 2018-2020 graduates received a job or internship offer on Handshake.
    • Handshake is trusted by all 500 of the Fortune 500.
22 / 47

Data Science Process


23 / 47

Data Science Process

Data Science Process

  • Data science is a cross-disciplinary practice that draws on methods from data cleaning, exploratory data analysis, and machine learning analysis.

  • Data science focuses on implementing data-driven decisions and managing their consequences.

24 / 47

Data Science Process

Data science project roles and responsibilities

  • Project sponsor represents the business interests; champions the project;
  • Client represents end users' interests;
  • Data scientist sets and executes analytic strategy; communicates with sponsor and client;
  • Data architect manages data and data storage and sometimes manages data collection;
  • Operations manage infrastructure and deploy final project results.
25 / 47

Data Science Process

Stages of a data science project

26 / 47

Data Science Process

Motivational example of data science project

  • Suppose you're interested in how much social media campaigns are effective on climate change legislation.

  • The fossil fuel industry may feel that it's losing too much money because of regulations related to climate change and wants to reduce its losses via lobbying.

  • To what extent do social media campaigns competite against fossil fuel lobbying on climate change legislation?

27 / 47

Social Media Campaigns, Lobbying and Legislation: Evidence from #climatechange/#globalwarming and Energy Lobbies


28 / 47

Social Media

The Rise of Social Media

29 / 47

Social Media

Climate Change Campaigns in Social Media

Trend in the number of tweets with #climatechange/#globalwarming and retweets/likes to those tweets

Trend in the number of tweets with #climatechange/#globalwarming and retweets/likes to those tweets

30 / 47

Social Media

Climate Change Campaigns in Social Media

  • Climate change campaigns in social media are most likely to be driven by a small group of active campaigners.
  • During the years 2012-2017,
    • 753,853 users wrote 5.0 million tweets with #climatechange/#globalwarming.
    • 26,000 users, approximately 3% of those 753,853 users, wrote more than 3.0 million tweets, 60% of those 5.0 million tweets.
    • 1,403,639 users retweeted/liked to those tweets 12.1 million times in total.
31 / 47

Social Media

Climate Change Campaigns in Social Media

Per-capita number of tweets, retweets and likes with #climatechange/#globalwarming (2012 and 2013)Per-capita number of tweets, retweets and likes with #climatechange/#globalwarming (2012 and 2013)

Per-capita number of tweets, retweets and likes with #climatechange/#globalwarming (2012 and 2013)

32 / 47

Social Media

Climate Change Campaigns in Social Media

Per-capita number of tweets, retweets and likes with #climatechange/#globalwarming (2014 and 2015)Per-capita number of tweets, retweets and likes with #climatechange/#globalwarming (2014 and 2015)

Per-capita number of tweets, retweets and likes with #climatechange/#globalwarming (2014 and 2015)

33 / 47

Social Media

Climate Change Campaigns in Social Media

Per-capita number of tweets, retweets and likes with #climatechange/#globalwarming (2016 and 2017)Per-capita number of tweets, retweets and likes with #climatechange/#globalwarming (2016 and 2017)

Per-capita number of tweets, retweets and likes with #climatechange/#globalwarming (2016 and 2017)

34 / 47

Social Media

Narratives in Social Media Campaigns

  • Topic modeling method clusters a group of words that best characterize a document.

  • The idea behind the topic modeling is that documents are a mixture of latent topics, in which a topic is characterized by a probability distribution over words.

35 / 47

Social Media

Narratives in Social Media Campaigns

Topics 1, 2, & 3 from 2016 US tweets with #climatechange/#globalwarmingTopics 1, 2, & 3 from 2016 US tweets with #climatechange/#globalwarmingTopics 1, 2, & 3 from 2016 US tweets with #climatechange/#globalwarming

Topics 1, 2, & 3 from 2016 US tweets with #climatechange/#globalwarming

36 / 47

Social Media

Narratives in Social Media Campaigns

Topics 4, 5, & 6 from 2016 US tweets with #climatechange/#globalwarmingTopics 4, 5, & 6 from 2016 US tweets with #climatechange/#globalwarmingTopics 4, 5, & 6 from 2016 US tweets with #climatechange/#globalwarming

Topics 4, 5, & 6 from 2016 US tweets with #climatechange/#globalwarming

37 / 47

Social Media

Narratives in Social Media Campaigns

  • Social media campaigns have increasingly become more politically influential.

    • My text analysis indicates that at least 20% of US tweets with #climatechange/#globalwarming during 2012-2017 contain some political words.
  • The two most frequently appeared words from the US tweets with #climatechange/#globalwarming during 2012-2017 are:

    • UniteBlue: an organization or slogan for the wide range of the left activists.
    • Tcot: an acronym for Top Conservative on Twitter.''
38 / 47

Social Media

Sentiment in Social Media Campaigns

  • Sentiment in social media campaigns may play an important role in forming public opinion.

  • Some research finds that ...

    • Sentiment of tweets can contribute to the vote outcome.
    • Information with negative sentiment is transmitted faster than one with positive sentiment.
39 / 47

Social Media

Sentiment in Social Media Campaigns, Neutral

Word clouds from US tweets with #climatechange/#globalwarming, Neutral

Word clouds from US tweets with #climatechange/#globalwarming, Neutral

40 / 47

Social Media

Sentiment in Social Media Campaigns, Negative

Word clouds from US tweets with #climatechange/#globalwarming, Weakly and Strongly NegativeWord clouds from US tweets with #climatechange/#globalwarming, Weakly and Strongly Negative

Word clouds from US tweets with #climatechange/#globalwarming, Weakly and Strongly Negative

41 / 47

Social Media

Sentiment in Social Media Campaigns, Positive

Word clouds from US tweets with #climatechange/#globalwarming, Weakly and Strongly PositiveWord clouds from US tweets with #climatechange/#globalwarming, Weakly and Strongly Positive

Word clouds from US tweets with #climatechange/#globalwarming, Weakly and Strongly Positive

42 / 47

Climate-unfriendly Bills

Climate-unfriendly legislation

Bills that include sections, which are unfavorable to the action on climate change, 113th US Congress (2013-2014)

Bills that include sections, which are unfavorable to the action on climate change, 113th US Congress (2013-2014)

  • H.R.4923-113th prohibits funds made available by this Act from being used to design, implement, administer or carry out several programs, reports, and technical updates related to global climate change and the social cost of carbon.
43 / 47

Climate-unfriendly Bills

Climate-unfriendly legislation

Bills that include sections, which are unfavorable to the action on climate change, 114th US Congress (2015-2016)

Bills that include sections, which are unfavorable to the action on climate change, 114th US Congress (2015-2016)

  • H.R.2822-114th prohibits funds from being used to incorporate the social cost of carbon into any rule-making or guidance document until a new Interagency Working Group makes specified revisions to the estimates.
44 / 47

Climate-unfriendly Bills

Climate-unfriendly legislation

Bills that include sections, which are unfavorable to the action on climate change, 115th US Congress (2017-2018)

Bills that include sections, which are unfavorable to the action on climate change, 115th US Congress (2017-2018)

  • H.CON.RES.119-115th expresses the sense of Congress that a carbon tax would be detrimental to American families and businesses and is not in the best interest of the United States.
45 / 47

Climate-unfriendly Bills

Climate-unfriendly legislation

Bills that include sections, which are unfavorable to the action on climate change, 115th US Congress (2017-2018)

Bills that include sections, which are unfavorable to the action on climate change, 115th US Congress (2017-2018)

46 / 47

Social Media Campaigns vs. Lobbying on Legislation

Why Does It Matter?

  • I provide empirical evidence on ...
    • competition between NGO activism and industry lobbying for political influences.
    • the effects of social media campaigns on legislation.
47 / 47

Instructor


2 / 47
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow