“Demystifying Data Science” remote notes

Oct 24, 2018 12 min read rstats

To carry on our momentum from a few weeks ago from our useR!2018 remote notes blog post, this time we will be summarizing the Demystifying Data Sience 2018 conference for which you can register for free. We are just following David Robinson’s advice to blog all the time!

Conference overview

We got interested in this conference¹ thanks to tweets like these ones that highlight that:

data scientists are young!
specialists are more in demand!

Hopefully you find these tweets interesting as well. We can find more about the conference on Twitter using the DemystifyDS hashtag which also covers previous conferences. We see that the official event account thisismetis really went all out with branded summary tweets! You can find recordings of all the talks and there were many interesting titles. So we decided to spend 2 sessions and watched 4 full talks.

Navigating the Maze of the Data Science Job Hunt by Mark Meloon, Data Scientist, ServiceNow.
How to Get a Foothold in the Field of Data Science by Brandon Rohrer, Data Scientist, Facebook.
Data Visualization: How to Overcome Common Challenges by Kate Strachnyi, Manager and Data Visualization Specialist, Deloitte.
The Art of & Science of Creating a Actionable Data Story by Mico Yuk, Chief Executive Officer, BI-Brainz Group | Author, Data Visualization for Dummies & More.

Talks summaries

Mark Meloon `MarkMeloon`

In the first talk, by Mark Meloon, we learned about the power of LinkedIn for networking and finding your next job. He suggested posting regularly on LinkedIn as your feed will show up more on others’, allowing you to connect with more people. If you write something about content described by someone you especially admire or hope to work for, you are more likely to catch their attention. It’s best to not ask people directly for a job, but to contact them first to discuss their work or to ask for advice. He also suggested adding key data analysis techniques to your profile. He suggested that describing the techniques with specificity would be best, instead of using more vague terms.

Brandon Rohrer `_brohrer_`

In the second talk, by Brandon Rohrer, we learned about the different data science careers that are possible.

The major fields are:

Data Analysis - statistics and interpretation
Data Modeling - machine learning, prediction
Data Engineering - automation, databases, programming

The major roles/archetypes are:

Generalist - decent at all three fields
Detective - master of analysis
Oracle - master of modeling
Maker - master of engineering
Unicorn - master of all!!!

He ended by mentioning that job postings using the term “data science” often vary widely, and he recommends ignoring the posted job titles and de-emphasizing the specific tools listed, and instead focus on the skills that are being asked for to get a real sense of the job and how you would perform.

Kate Strachnyi `StorybyData`

In the third talk by Kate Strachnyi we learned about how to overcome challenges in data visualization. She described data visualizations as “Information Maps” that should ideally be:

Informative
Efficient
Appealing

Common issues were:

Wrong chart choice - some charts will be much more effective
Improper use of color - use to tell a story in a useful way - not just to be decorative
Information overload - don’t try to do too much at once - loses impact
Clutter - leave out the nonessential
Not speaking the same language - know your audience (jargon/lingo)

She also noted that we should be careful about color schemes. She suggested that there are websites to check how your figures would appear to others with colorblindness.

She mostly uses tableau in her work and suggested that it makes a nice free option for data visualization.

Mico Yuk `micoyuk`

In the fourth and last talk by Mico Yuk we learned about storyboards and remembering that our data analyses are always to try to tell story about the data. She pointed out that the human mind is wired visually, that we retain about 80% of what we see, 20% of what we read, and 10% of what we hear. She suggested that we create SMART goals (she credits Peter Drucker) to make sure that our work is driven efficiently in the correct direction. She suggested that communicating our work in a SMART goal-based framework based would concisely and clearly communicate the purpose and results of work.

Our impressions

Given our diversity of impressions we thought it would be more useful to share our impressions. Without further ado, here they are.

P1

I found Mark Meloon’s talk very useful. I have actually started posting more regularly on my own LinkedIn account and it has indeed captured more attention from others. In fact, I have even received emails from companies interested in hiring someone with my expertise. Brandon Rohrer clarified some trends that I had noticed about data science. I identify with the “Detective” role and I see that while I may aspire at times (unsuccessfully) to be a “Generalist - or someday a Unicorn”, my experience as a Detective is very worthwhile as well. I love data visualization and I loved Kate Strachnyi’s talk. I found her tips to be very clear reminders for how to continue with my own visualizations. The talk by Mico Yuk was a good reminder to keep overall goals in mind as you work and to regularly take a step back and assess if your work is really proceeding in the direction and at the rate that you planned.

P2

Mark Meloon’s talk emphasized the use of LinkedIn for networking and job hunting. He interviews job candidates for his company so his viewpoint was a direct reflection of someone who uses the website to find and/or assess job applicants. I liked that he gave both good and bad examples of actual profiles and messages he’s seen on LinkedIn. He also noted that, to get a foot in the door of a job posting, you don’t need to directly know the hiring manager, but reaching out to anyone you know in the company, even if it’s a second- or third-level connection (i.e. friend of a friend), is better than nothing, as long as you do it right. I do wish he had spoken about other social media platforms, such as Twitter, and how they compare to LinkedIn for networking.

I found the breakdown of skills and job types by Brandon Rohrer to be really instructive. It made me reflect on my own interests and skills in a broken down way, and I think it will help to have this framework for both future job hunts and interviews. I particularly like that he emphasized it’s okay/normal to not be great at everything related to data science - it’s a broad field - and that people with a narrower set of expertise are still needed and valuable for specific jobs. His talk also gave me some ideas of skills I may be able to work on and add to my portfolio to round out my skill-set. I would recommend this talk to anyone in the data science or analysis fields that is looking for clarity or definition in their current job or career path!

P3

Kate Strachnyi’s talk was a great reminder of the importance of keeping your audience in mind when presenting information and making sure that visualizations are not just accurate but also easily understood. Her list of common issues was a helpful summary of guidelines I’ve heard before, and I appreciated the examples she used. In particular, I think I often run into the challenge of “information overload” when I present informally to others – I need to remember that it’s not enough for the information to be there, it also needs to be arranged in way that lets people understand it quickly.

Mico Yuk’s talk was probably more applicable to someone working in a corporate field rather than an academic one, but the main idea of framing data as a story and keeping the goal in mind was still relevant to me. Some of the suggestions, like asking the “right questions” of your user, could easily be reworked for research (even if the user is just be me). I haven’t worked with a storyboard before, but it would be interesting to see if that approach could also apply to planning out analyses for a research paper – the goal might be the question we’re asking, KPI the metrics we’re using to answer that question, trends the conclusions we can draw, and actions the next direction of analysis. The translation from business to academic research probably needs some tweaking, but I might try this approach on a future project to help with organization and keeping the bigger story in mind.

P4

Mark Meloon’s talk reminded me that many use LinkedIn for networking which hasn’t been that common in my experience in academia. This is something I would need to keep in mind for advising students in the future that are either unsure of staying in academia or want to go to industry. I do brush up my profile once in a while, and parts of Mark’s advice applies also to CVs (writing them and sending them via email): basically, be genuine and respectful of others.

Brandon Rohrer verbalized distinctions in data science roles that I had either heard of before or had some intuition behind them, but hadn’t actually spent the time to see them as clearly defined as Brandon did. I was also quite curious of everyone’s reaction to his talk and how each of us labelled ourselves. For example, maybe X thought Z was a unicorn, but Z perceived themselves as a beginner. In my case, I think that it’s probably too ambitious to get to the unicorn level. I’m simply aiming to get to (or am at) a level where I can understand most of the terms and conversations, but then go back and research a bit more if I need to as preparation for a follow up meeting. I guess that I’m a generalist.

Kate Strachnyi’s key points are I guess topics that I’ve heard before and loosely follow. I think that her audience is different from mine as she seems to create visualizations that are used in many company presentations. I’m frequently under pressure to get a simple version of a plot done where we can see the trend in the data and only work on polishing a few selected plots that get highlighted in a research paper. Though I guess that I could/should spend a bit more time thinking about the plot design and colors before I make the next one. For that, I would like to learn more about the paletteer R package:

ICYMI, 🎨 With more palettes than a tweet could possibly contain…
"paletteer: Collection of most color palettes in a single R 📦" 👨‍🎨 @Emil_Hvitfeldt https://t.co/7kKSyohQN4 #rstats pic.twitter.com/zibFhW03EU
— Mara Averick (@dataandme) July 24, 2018

Mico Yuk talked about SMART goals. Hmm… I don’t remember what that stood for, so I clearly would need to re-watch her talk. After skimming through it again I guess that I can only say that it was hard for me to relate to her talk because I haven’t been in a project that involved all planning steps that she talked about. While it wasn’t for me, it might be useful to you, so give it a try!

Wrapping up

Thanks for getting this far. We are curious to hear what where your own impressions in these and other talks from the Demystifying Data Sience 2018: they have 28 recorded talks in total! We also hope that you enjoyed reading about our different perspectives.

Acknowledgments

We are grateful to everyone that tweeted about the conference and shared their materials online! We are also happy that Metis got interested in our summary blog post.

This blog post was made possible thanks to:

References

[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.

[2] G. Csárdi, R. core, H. Wickham, W. Chang, et al. sessioninfo: R Session Information. R package version 1.1.0.9000. 2018. URL: https://github.com/r-lib/sessioninfo#readme.

[3] Y. Xie, A. P. Hill and A. Thomas. blogdown: Creating Websites with R Markdown. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: https://github.com/rstudio/blogdown.

Reproducibility

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                                      
##  version  R version 3.5.1 Patched (2018-10-14 r75439)
##  os       macOS High Sierra 10.13.6                  
##  system   x86_64, darwin15.6.0                       
##  ui       X11                                        
##  language (EN)                                       
##  collate  en_US.UTF-8                                
##  ctype    en_US.UTF-8                                
##  tz       America/New_York                           
##  date     2018-10-24                                 
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package       * version    date       lib source                            
##  assertthat      0.2.0      2017-04-11 [1] CRAN (R 3.5.0)                    
##  backports       1.1.2      2017-12-13 [1] CRAN (R 3.5.0)                    
##  bibtex          0.4.2      2017-06-30 [1] CRAN (R 3.5.0)                    
##  BiocStyle     * 2.8.2      2018-05-30 [1] Bioconductor                      
##  blogdown        0.8        2018-07-15 [1] CRAN (R 3.5.0)                    
##  bookdown        0.7        2018-02-18 [1] CRAN (R 3.5.0)                    
##  cli             1.0.1      2018-09-25 [1] CRAN (R 3.5.0)                    
##  colorout      * 1.2-0      2018-05-03 [1] Github (jalvesaq/colorout@c42088d)
##  crayon          1.3.4      2017-09-16 [1] CRAN (R 3.5.0)                    
##  digest          0.6.18     2018-10-10 [1] CRAN (R 3.5.0)                    
##  evaluate        0.12       2018-10-09 [1] CRAN (R 3.5.0)                    
##  htmltools       0.3.6      2017-04-28 [1] CRAN (R 3.5.0)                    
##  httr            1.3.1      2017-08-20 [1] CRAN (R 3.5.0)                    
##  jsonlite        1.5        2017-06-01 [1] CRAN (R 3.5.0)                    
##  knitcitations * 1.0.8      2017-07-04 [1] CRAN (R 3.5.0)                    
##  knitr           1.20       2018-02-20 [1] CRAN (R 3.5.0)                    
##  lubridate       1.7.4      2018-04-11 [1] CRAN (R 3.5.0)                    
##  magrittr        1.5        2014-11-22 [1] CRAN (R 3.5.0)                    
##  plyr            1.8.4      2016-06-08 [1] CRAN (R 3.5.0)                    
##  R6              2.3.0      2018-10-04 [1] CRAN (R 3.5.0)                    
##  Rcpp            0.12.19    2018-10-01 [1] CRAN (R 3.5.1)                    
##  RefManageR      1.2.0      2018-04-25 [1] CRAN (R 3.5.0)                    
##  rmarkdown       1.10       2018-06-11 [1] CRAN (R 3.5.0)                    
##  rprojroot       1.3-2      2018-01-03 [1] CRAN (R 3.5.0)                    
##  sessioninfo   * 1.1.0.9000 2018-10-02 [1] Github (r-lib/sessioninfo@4f91fad)
##  stringi         1.2.4      2018-07-20 [1] CRAN (R 3.5.0)                    
##  stringr         1.3.1      2018-05-10 [1] CRAN (R 3.5.0)                    
##  withr           2.1.2      2018-03-15 [1] CRAN (R 3.5.0)                    
##  xfun            0.3        2018-07-06 [1] CRAN (R 3.5.0)                    
##  xml2            1.2.0      2018-01-24 [1] CRAN (R 3.5.0)                    
##  yaml            2.2.0      2018-07-25 [1] CRAN (R 3.5.0)                    
## 
## [1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library

This conference covered a large spectrum of data science topics, hence the picture for the post!↩

conference

Continuous rstats learning

We are researchers at the @LieberInstitute, blogging about R packages, how-to guides & occasionally our own open-source software (opinions r our own) #rstats