Archive for the ‘university’ Category

h1

Introduction to Data Science Tutorial

December 27, 2015

Next two days, 28th and 29th December I’ll be giving a tutorial at KAIST hosted by Alice Oh.   We just flew in last night from visiting Chengdu and Xi’an in China.  This is based on the Introduction to Data Science unitt, FIT5145, at Monash.

 

Advertisements
h1

Github activity

November 24, 2015

So most of this year I spent doing the Introduction to Data Science (introductory unit at Monash) and getting the Grad. Dip. of Data Science and the Master of Data Science up and running (some background here).

As a result, you can see the disastrous impact it has had on my Github activity, which is a measure of my coding productivity!

GithubWray

Wray’s activity on Github for 2015

h1

Introduction to Data Science

November 7, 2015

On 11th-14th January 2016 I’ll be visiting the School of IT at Monash University Malaysia, which is located within the Bandar Sunway township in Malaysia just outside Kuala Lumpur city.  My talk should be on the Monday (11th).  The slides are here (available temporarily).

Title:  Introduction to Data Science

This 2 hour seminar works through some of the emerging highlights of Data Science, reviewing major videos, blogs and articles that helped mold the field. This seminar looks at processes and case studies to understand the many facets of working with data, and the significant effort in Data Science over and above the core task of Data Analysis.  So the series is a broad introduction to working with data rather than a deep dive into the world of statistics. The seminar is aimed at those with an IT background who either want to start in Data Science or work with it, for instance in management or as a data engineer.  Attendees should have a knowledge of information technology and computer science.

The talk will be extracted from our FIT5145 unit given in the Master of Data Science.

h1

Data Science Resources

October 26, 2015

For my main job, I am Director of the Master of Data Science.  This is a fast paced field that is just as much industry as academia, and a lot of the really exciting stuff is applications.  To keep up you need to monitor the media.  There are too many resources to name or list them all, or to attempt to do some kind of thorough tracking.  I recommend students, however, to install a news aggregator on their tablet/smart-phone/laptop and enrol in some of the better and more relevant RSS feeds, to keep track.

All the big business and technology magazines have relevant sections on Data Science or Big Data:  Forbe’s, Harvard Business Review, O’Reilly, ZDNet, MIT Sloan Management Review, Information WeekWired, InfoWorld, TechCrunch (big data) and TechCrunch (data science), … Each of these has a particular perspective, which is useful in understanding their contributions.  For instance, TechCrunch is a technology startup magazine whereas Forbes targets Fortune 500 companies.  The articles in this class of magazines usually are good quality, although there are sometimes “commissioned” journalism or press releases for marketing.

Many technology blogs focus on Data Science.  The following are listed as most popular first:  KDNuggets.comDataScienceCentral.com and its offshoot AnalyticBridge.comDatafloq.comAllAnalytics.comPredictiveAnalyticsToday.com, Dataconomy.com, 101.DataScience.community, DataScienceWeekly.org.  The first, KDNuggets has been in the business for almost two decades.  Many of these have email and RSS subscription services and Twitter feeds.  Some of these have a low signal to noise ratio so it is easy to get drowned in content.  See also Quora’s What are the best blogs for data scientists to read?” for more discussion.

There are two weekly newsletters that you should sign up to for great content in your email. The Data Science Weekly Newsletter has more of a technology orientation with, for instance, some popular machine learning content.  The O’Reilly Data Newsletter is more about industry and is essential reading for anyone who wants to remain current.

Most of the blogs are also coupled with curated information sources.  Other site with curated information are Resources to Learn Data Science Online and Big Data and Applications Knowledge Repository.  This second one also has a good list of conferences.

A related category are the question answering sites: Quora has Data Science and Big Data channels, though many other discussions are useful too.  A site more in the Slashdot style is Datatau.com.

Pinterest.com is a site that records infographics.  e.g., queries for “data science” and “big data“.  These are seductive, and some certainly informative.  Datafloq.com also has an infographics section.  Some notables here that go way beyond infographics are cheat sheets: Machine Learning Cheat Sheet and the Probability Cheat Sheet.  These are handy academic references, and also a nice way to find out what you do not know.

Many sites give collections of data sets, so perhaps the  most notable here are: aws.amazon.com data setsKDNuggets.com awesome public datasets, Google’s public data directory, Quora.com large data sets, …  The Internet Archive is a long running source of free digital content (books, etc.).  There are many, many more such sites, especially as governments now support open data.

Finally, most terms and concepts are well explained in the Wikipedia, often with good diagrams and related discussions.  As one delves into the more esoteric aspects of statistics or computer science, the quality of Wikipedia’s entries drop’s off.  Wikipedia’s definition of Data Science, for instance, as “a continuation of the field data mining and predictive analytics” would be hotly contested by some, but others would find the distinctions not that important.

WikiBooks has now produced Data Science: An Introduction, which I haven’t looked at properly yet but the outline seems OK.  I am skeptical of such efforts because the typical academic author has a focused speciality and a list of axes to grind … not me of course, oh no, not me 😉

h1

A tutorial at the ML Bootcamp

August 17, 2015

The ML Bootcamp is a joint University of Warwick and Monash University programme organised by PhD students.  Really great programme with all sorts of cool stuff in data science.  My tutorial is Introducing Document Analysis (pdf slides).  This is a “grand tour” tutorial, giving lots of examples rather then properly covering any particular theories or algorithms.

An earlier talk I gave, on a related topic, is Introduction to Text Mining (PDF slides), originally given to a business-technical audience in 2014.  So this is more a motivational talk on text mining, why it is useful and why it is difficult.

h1

Information for candidate research students

November 2, 2014

I wrote a page for candidate research students here.  Always happy to hear from you folks.

h1

On topic models

February 3, 2014

Wow, I’ve had this blog page sitting around for over a year and done nothing!    Now I’m a professor at Monash University, with a dry, content-controlled university website, I think I need a better website!