About me:
5th Year PhD student in Statistics at UC Irvine
Areas of specialization: Bayesian Statistics, Network data analysis, Probabilistic clustering, Data science education
About you:
One person per team: introduce your team members (names and institutions)
Another person per team: introduce your research topic (1-2 sentences)
Another person per team: tell us about your progress (challenges addressed and to be addressed next)
writing a paper (data analysis report)
structure of a paper
strategies to get started writing
strategies for organizing the work
useful software
what is a good title/abstract? what is a good introduction?
look at examples of what’s good or less-good
USRESP competition
It’s usually the culmination of a research project.
It is a document to describe to someone:
an interesting problem (that they might not know)
ways to approach the problem (that they might not know!)
what results one gets when they approach the problem in these ways
what insights these results tell us (and how they relate to other insights that people could read elsewhere)
It’s usually the culmination of a research project.
It is a document to describe to someone:
an interesting problem (that they might not know)
ways to approach the problem (that they might not know!)
what results one gets when they approach the problem in these ways
what insights these results tell us (and how they relate to other insights that people can find elsewhere)
What are the common sections of DA report?
Title
Abstract
Introduction (Background)
Methods
Results
Discussion/Conclusion
References
Personal satisfaction
Gain experience with scientific writing (essential in every data science jobs, not only academia)
Can talk about this experience in your cover letter for grad school
Motivation
Get feedback
Your CV
Learn some concrete guidelines and judging criteria
This workshop could be a semester-long class
We are going to condense some things
You will be given material and links to come back to later
The elements of a good paper
Writing a paper (hands-on work in teams)
Final remarks and links to resources
Previous winners and honorable mentions at USRESP include1 :
Topic | Title and Link |
---|---|
Drug treatment dropout | Psychiatric Comorbidity in Opioid Use Treatment Outcomes, Linda Tang (Fall 2021) |
Air pollution over time | Behind The Smoke: An Extreme Value Analysis Of Air Pollution In Minnesota, Yicheng Shen, Jacob Flignor, Libby Nachreiner, & Karen Wang (Spring 2022) |
Prediction and management of flash floods | Storm Chasers: Synthesizing New England Weather Data On A Dashboard For Emergency Response Workers Irene Foster, Sunshine Schneider, Caitlin Timmons, Katelyn Diaz (Fall 2022) |
Question
Discuss in teams and identify one to three elements you think should characterize a good data analysis paper (2 minutes)
It is clear and easy to read
It tells an interesting story
It makes a good statistical analysis and explains the results well
Question
Discuss in teams and identify one to three elements you think should characterize a good data analysis paper (2 minutes)
It is clear and easy to read Overall clarity and presentation1
It tells an interesting story Originality, creativity, and significance of the study1
It makes a good statistical analysis and explains the results well Accuracy of data analysis, conclusions, and discussion1
What content is included/not included
Good principle: every element in a story must be necessary, and irrelevant elements should be removed
Examples of common problems:
Not understandable to a reader with little knowledge of the applied domains
Define something that you are never going to use
e.g. Define HIV, describe its implications and prevalence, but then your analysis does not use information on HIV and HIV is never mentioned again (e.g. in the discussion)
Use terminology, acronyms or metrics that you have not defined
When I was wrong: It does not read as ‘advanced’
Every acronym that you are going to use needs to be defined the first time you use it - example (OS and PFS in Introduction)
How content is organized in your paper
If you read your paper: can you tell what the story is?
Common problems:
Tip
In the introduction, explicitly state the hypothesis/goals of the paper.
A good example : Psychiatric Comorbidity in Opioid Use Treatment Outcomes by Linda Tang (last paragraph of Introduction)
Tips
Always try to work on an interesting application
What is something you think is particularly exciting of the topic - make sure you communicate that excitement!
Tip
Can you add an effective and eye-catching visualization to represent your topic?
A good example : Behind The Smoke: An Extreme Value Analysis Of Air Pollution In Minnesota by Yicheng Shen, Jacob Flignor, Libby Nachreiner, & Karen Wang (Figure 1, Introduction)
Tips
Tip
Wait, so what is the answer to the research questions?
Avoid overgeneralizing or overstating your results
A good example
From Behind The Smoke: An Extreme Value Analysis Of Air Pollution In Minnesota by Yicheng Shen, Jacob Flignor, Libby Nachreiner, & Karen Wang
:::
After the project is concluded: Project –> Writing
Pros: Content is decided
Cons: Content is decided
During project development: Project –> Writing -> Project -> …
Pros: Writing about the project can improve the way you think about it and the choices you make
Cons: Content might change
Can even think of the paper before starting working on the project: Writing -> Project -> …
Abstract - overview of the project
Introduction - what is your project about?
Methods - what did you do and how?
Results - what did you find out?
Discussion - what do your findings mean?
Strategy 1 - High level to low level
Create empty document, create empty sections
In each section, outline points to make
Visualize the resulting paper in your head, and reiterate 2., until it feels like the paper would tell the story well
In each section, list the paragraphs you need to write (i.e. the goal of the paragraphs and their sequence)
Fill-in the content of each paragraph accordingly
Strategy 2 - From results backward
Start from thinking how you would visualize the results: what are your main figures (tables)?
In reverse, plan Methods and Introduction to tell the reader just all they need to know to fully understand and appreciate the results.
Write the Discussion section. Edit the Introduction if needed, to provide necessary context to appreciate/anticipate discussion.
For the remaining of the workshop we will do a blend of these two strategies.
In the isi-buds organization on GitHub, find and the repository writing-workshop-2024
One person for each team, clone the workshop’s repository (or download it); add a copy of the folder paper
to your team’s repository; push this change
All team members: pull this update
Together: look at what is in the folder
What did your study find out?
How to frame results section1:
typically, results sections start with descriptive statistics and inferential (i.e. hypothesis tests) statistics come next.
information presented must be relevant in helping to answer the research question(s) of interest
Tables and figures are useful in this section and should be labeled, embedded in the text, and referenced appropriately.
Tips
Focus on a few, well explained points (e.g. 1, 2 or 3 main results)
Think how you would visualize these results best
Make these visualizations and write “around” the visualizations
Reference and make use of your Tables and Figures in the text. Do not abandon them
Define a metric well - example (highlighted text in Sec. 4.1)
Open the file paper/exercises/1-results.md and follow the instructions:
Work in sub-teams (5 min)
Talk within each team (3 min)
One team shares with everyone (random pick).
Example
From Psychiatric Comorbidity In Opioid Use Treatment Outcomes (Linda Tang, winner at 2021 Fall USRESP)
Example
From An Evaluation Of Regularization Methods: When There Are More Predictors Than Observations (Kenny Chen, honorable mention at 2021 Fall USRESP)
Note: in this paper, this excerpt was from the Introduction section
Example
Design visualizations to help digest complicated methods
Write clear figure captions
From Storm Chasers: Synthesizing New England Weather Data On A Dashboard For Emergency Response Workers (Irene Foster, Sunshine Schneider, Caitlin Timmons, Katelyn Diaz,winner at 2022 Fall USRESP)
Open the file paper/exercises/2-methodology.md and follow the instructions:
Work in sub-teams (5 min)
Talk within each team (3 min)
One team shares with everyone (random pick).
Open the file paper/exercises/1-results.md and follow the instructions:
Work in sub-teams (10 min)
Talk within each team (5 min)
One team shares with everyone (random pick).
Logical organization, moving from the general to the specific
Provide sufficient background to understand the paper
Relate this paper to other work in the scientific literature
Provide explanation for why this work is important
End with statements about the hypothesis/goals of the paper
Tip
As you write, think that every undergraduate student should be able to follow your introduction.
You can be specific, but be gentle, e.g. remind the reader of definitions.
You can write obvious things to establish the grounds, e.g. Alzheimer disease is one of the main challenges faced by modern medicine.. Pro-move is back it up with figures/statements from a reputable source, e.g. Indeed, an estimated %% of adults will develop this disease as they age [citation].
Example
From Exploring Missingness and its Implications on Traffic Stop Data (Amber Lee, winner at 2020 Fall USRESP)
Example
From Psychiatric Comorbidity in Opioid Use Treatment Outcomes (Linda Tang , winner at 2021 Fall USRESP)
Open the file paper/exercises/3-introduction.md and follow the instructions:
Work in sub-teams (10 min)
Talk within each team (5 min)
One team shares with everyone (random pick).
Clearly state whether the results answer the question (support or disprove the hypothesis)?
Cite specific data from the results to support each interpretation. Articulate the basis for supporting or rejecting each hypothesis
Relate the results of the current work to previous research
This depends on the exact results you get, there is not much to plan ahead! (So no workout)
Why do we write it?
How can we achieve that? A simple way:
XXX is important because of YYY. Previous studies demonstrated / have neglected ZZZ. In this work we do QQQ. We find that BBB. Our study suggests that KKK.
From Behind The Smoke: An Extreme Value Analysis Of Air Pollution In Minnesota by Yicheng Shen, Jacob Flignor, Libby Nachreiner, & Karen Wang (USRESP 2022 Spring)
Example
Poor air quality is a major environmental health threat. Even short-term exposure to poor air quality— such as during extreme pollution events—can cause severe respiratory distress. While there have been significant decreases in Minnesota air pollution levels over the past 40 years, the summer of 2021 upset this trend with Hennepin County reporting the highest particulate measure in the past 20 years. This study focuses on analyzing the extreme values of pollutant concentration levels of sulfur dioxide (SO2) and fine inhalable particles (PM2.5) across three Minnesota counties as collected by the Environmental Protection Agency from 1980 to 2021. We employ extreme value analysis methods to fit the pollutant data. The models find that SO2 levels have fallen substantially since 1980 in accordance with EPA policies regulating diesel fuel and coal power plants. This dramatic decrease has made the magnitude of severe pollution incidents appear far more extreme than in earlier decades, with typical events in the 1980-1990s often equating to one in a hundred year events today. By contrast, no downward trend in PM2.5 levels was observed over the past 20 years, an expected result given that PM2.5 has more varied sources and is therefore harder to regulate than SO2. However, models show a significant seasonal trend with peaks during winter months, revealing this past ‘summer of smoke’ as particularly extreme.
Can you see what summarizes the introduction? What relates to the background? What relates to the methods? What relates to the results? What relates to the discussion?
From Behind The Smoke: An Extreme Value Analysis Of Air Pollution In Minnesota by Yicheng Shen, Jacob Flignor, Libby Nachreiner, & Karen Wang (USRESP 2022 Spring):
Example
Poor air quality is a major environmental health threat. Even short-term exposure to poor air quality— such as during extreme pollution events—can cause severe respiratory distress. While there have been significant decreases in Minnesota air pollution levels over the past 40 years, the summer of 2021 upset this trend with Hennepin County reporting the highest particulate measure in the past 20 years. This study focuses on analyzing the extreme values of pollutant concentration levels of sulfur dioxide (SO2) and fine inhalable particles (PM2.5) across three Minnesota counties as collected by the Environmental Protection Agency from 1980 to 2021. We employ extreme value analysis methods to fit the pollutant data. The models find that SO2 levels have fallen substantially since 1980in accordance with EPA policies regulating diesel fuel and coal power plants. This dramatic decrease has made the magnitude of severe pollution incidents appear far more extreme than in earlier decades, with typical events in the 1980-1990s often equating to one in a hundred year events today. By contrast, no downward trend in PM2.5 levels was observed over the past 20 years,an expected result given that PM2.5 has more varied sources and is therefore harder to regulate than SO2. However, models show a significant seasonal trend with peaks during winter months,revealing this past ‘summer of smoke’ as particularly extreme.
Can you see what summarizes the introduction? What relates to the background? What relates to the methods? What relates to the results? What relates to the discussion?
Everything from USRESP shown today is on their website!
Discuss with your team:
Who in your team wants to participate to the writing?
Who could be your advisor(s) [that is, who could give you feedback and supervise your paper]? When will you ask them?
In the paper
folder, the skeleton uses Quarto for writing a paper. Look at use of:
child-documents
section planning
cross-references Sections, Equations, Figures and Tables
references
quick demo of adding new references
The file 03-results.qmd
has an example table created with kable and kable extra
Please take a moment to provide feedback for future editions of this workshop!
Reach out to me for any questions
fzricci@uci.edu
https://federicazoe.github.io/