Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Reproducible Computing
@ JSM 2019

Colin Rundel

July 27, 2019

1 / 21

Reproducible Computing

2 / 21

Schedule

Time Activity
08:30 - 10:15 Literate programming and organization
10:15 - 10:30 :coffee:
10:30 - 12:30 Version control with Git and GitHub
12:30 - 14:00 :fork_and_knife:
14:00 - 15:15 Scaling reproducible projects, make
15:15 - 15:30 :coffee:
15:30 - 17:00 More make, wrapup
3 / 21

Reproducibility:
Who cares?

4 / 21

Science retracts gay marriage paper without agreement of lead author

  • In May 2015 Science retracted a study of how canvassers can sway people's opinions about gay marriage published just 5 months earlier.

  • Science Editor-in-Chief Marcia McNutt: Original survey data not made available for independent reproduction of results. + Survey incentives misrepresented. + Sponsorship statement false.

  • Two Berkeley grad students who attempted to replicate the study quickly discovered that the data must have been faked.

  • Methods we'll discuss today can't prevent this, but they can make it easier to discover issues.

Source: http://news.sciencemag.org/policy/2015/05/science-retracts-gay-marriage-paper-without-lead-author-s-consent
5 / 21

Bad spreadsheet merge kills depression paper, quick fix resurrects it

  • Original conclusion: Lower levels of CSF IL-6 were associated with current depression and with future depression [...].

  • Revised conclusion: Higher levels of CSF IL-6 and IL-8 were associated with current depression [...].






Source: http://retractionwatch.com/2014/07/01/bad-spreadsheet-merge-kills-depression-paper-quick-fix-resurrects-it/
6 / 21

Divorce study felled by a coding error gets a second chance

  • Original conclusion: The risk of divorce in a heterosexual marriage increases when the wife falls ill, but not the husband.

  • Corrected conclusion: Based on the corrected analysis, we conclude that there are not gender differences in the relationship between gender, pooled illness onset, and divorce.







Source: http://retractionwatch.com/2015/09/10/divorce-study-felled-by-a-coding-error-gets-a-second-chance/#more-32151
7 / 21

Divorce study retraction: Editor's note

  • "The research environment is fast-paced given the ethos to “publish or perish"."

  • "[...] research is becoming increasingly complex, with greater calls for transdisciplinary collaborations, “big data,” and more sophisticated research questions and methods [...] data sets often have multiple files that require merging, change the wording of questions over time, provide incomplete codebooks, and have unclear and sometimes duplicative variables."

  • "Given these issues, I would not be surprised if coding errors were fairly common [...]"


Source: http://retractionwatch.com/2015/09/10/divorce-study-felled-by-a-coding-error-gets-a-second-chance/#more-32151
8 / 21

One in five genetics papers contains errors thanks to Microsoft Excel

  • "Autoformatting in Microsoft Excel has caused many a headache—but now, a new study shows that one in five genetics papers in top scientific journals contains errors from the program, The Washington Post reports. The errors often arose when gene names in a spreadsheet were automatically changed to calendar dates or numerical values."

  • "For example, one gene called Septin-2 is commonly shortened to SEPT2, but is changed to 2-SEP and stored as the date 2 September 2016 by Excel."


Source: https://www.sciencemag.org/news/2016/08/one-five-genetics-papers-contains-errors-thanks-microsoft-excel
9 / 21

Reproducibility:
Why should you care?

10 / 21

Reproducible vs Replicable

Source: Patil, Peng, Leek (2019) A visual tool for defining reproducibility and replicability. Nature Human Behaviour
11 / 21

Reproducibility as a trust scale






Source: Gabriel Becker - Keynote - Advanced R Course - May Institute for Computational Proteomics 2019
12 / 21

Think back to every time...

  • The results in Table 1 don't seem to correspond to those in Figure 2.
  • In what order do I run these scripts?
  • Where did we get this data file?
  • Why did I omit those samples?
  • How did I make that figure?
  • "Your script is now giving an error."
  • "The attached is similar to the code we used."






Source: Karl Broman - [steps to reproducible research](https://kbroman.org/steps2rr/)
13 / 21

No collaborators?





Your closest collaborator is you six months ago,
but you don’t reply to emails.

  • Mark Holder




14 / 21

Reproducibility:
How?

15 / 21

Reproducibility checklist

  • Are the tables and figures reproducible from the code and data?
  • Does the code actually do what you think it does?
  • In addition to what was done, is it clear why it was done? (e.g. how were hyper / tuning parameters chosen?)
  • Can the code be used for other data?
  • Can you extend the code to do other things?
16 / 21

17 / 21

Ambitious goal + other concerns

We need an environment where

  • data, analysis, and results are tightly connected, or better yet, inseparable

  • reproducibility is built in

    • the original data remains untouched
    • all data manipulations and analyses are inherently documented
  • documentation is human readable and syntax is minimal

18 / 21

Toolkit






toolkit






19 / 21

Roadmap



20 / 21

Roadmap



Scriptability R

20 / 21

Roadmap



Scriptability R

Literate programming R Markdown

20 / 21

Roadmap



Scriptability R

Literate programming R Markdown

Version control git / GitHub

20 / 21

Roadmap



Scriptability R

Literate programming R Markdown

Version control git / GitHub

Scaling and automation make

20 / 21

Computing access

  • Go to http://bit.ly/jsm2019-repro-comp

  • Log in by creating an Account or using your Google / GitHub credentials.

  • Click the Start Button next to the Workshop Materials project

  • You should now be inside an RStudio Cloud Session that contains all of the workshop files
21 / 21

Reproducible Computing

2 / 21
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow