h1

To good health!

January 10, 2018

So enrolment sessions start soon for our incoming Master of Data Science students.  I know its a stressful time for some students in terms of “life”.  I usually talk briefly about staying healthy, and Monash offers various services to support this.  But for PhD students I think its important to take this on as a lifestyle objective.  They are undertaking a knowledge intensive career path and brain health will be critical for their future career.

Disclaimer:  Now, this page is full of opinions and pointers to, in some case, controversial material.  I’m just a little old computer science professor, so my opinions have no real backing, and I have no recognised expertise. All care but no responsibility for what I say! 

The fact is, health in the modern world is an issue fraught with challenges.  To understand this, consider the following:

  1. The official Australian government position on colds and flu prevention, and the official USA government position:  hygiene and vaccines.  What’s missing:  discussion of healthy diet and exercise to strengthen and repair the immune system.
  2. The Time magazine reports extensive research shows vitamin D helps prevent colds and flu, so some sunshine is also important.  No mention of this in the official government positions above!
  3. Believe it or not, in the USA prescription drugs are the third leading cause of death!  There is a larger issue here in that most published medical research findings are false.
  4. Tobacco science is a term used to describe fake science protecting an industry.  Read about the Tobacco Institute and see the movie The Insider.  How much of this goes on in the food industry?
  5. Sugar is now known to be very damaging to the health.  Here is a hard hitting discussion about it, though note quite a few of these claims are considered controversial.  But it is known that sugar suppresses the immune system.  Figuring out your sugar consumption is challenging.   There are rumors (in movie form) of tobacco science going on here too.
  6. Artificial sweeteners are not a substitute, in fact evidence suggests they have poor health impacts, and they mess up the brains analysis of your food intake.
  7. Fats are the subject of a massive onslaught from advertisers.  For years we were told to avoid butter and use margarine instead, but now it seems butter is good.  Current conflicting advice is being broadcast about the humble coconut.
  8. The health of organic produce is currently a propaganda battleground.  Hint:  organics are also lower in toxic pesticide residue, but no mention of that.
  9. The commercial world has taken on healthy eating big time, and it is the fastest growing segment of the food industry.  Monash University has done a wonderful job of getting really good fast food vendors at the Caulfield campus food court.

Summary:  There is lots of conflicting and bad advice out there!  Heck, even the government websites seem to have errors of omission.

Now, if we consider the specific position of someone who wants their brain to function well, then consider the following:

  1. Short term exercise is known to boost mental performance.
  2. Meditation and mindfulness is also known to boost performance in exams.
  3. Long term sitting is considered to be as bad for health as smoking!  Here is a poster of the dangers.
  4. There are also lifestyle recommendations about studying from scientists:  don’t cram for subjects, learn slowly over the semester.
  5. Recent studies show the brain can be encouraged to grow new cells.
  6. The brain is mostly fat, so we need healthy fats to work well.  Don’t believe a lot of what you read about fats!  Cholesterol is also important for the brain.
  7. Sugar consumption (e.g., soft drinks, commercial juices, commercial cereals, flavoured yogurts, etc. etc. etc.) is bad for the brain, as well as the immune system.
  8. Energy drinks rot the teeth, like soft drinks.  Its due to the high acid content.
  9. Canola oil is bad for the brain.  This one is important because most cheap salad oils, margarines and many food products are loaded with it.
  10. Deep sleep is the basis for memory, learning and health.   In particular, without deep sleep, your brain will not be functioning properly and your memory will be impaired.  See this TED talk on sleep (which is a sales pitch … sorry), but there are many articles on this.

Note, for each of these, there are 10’s-100’s of good articles and scientific literature to back it up, though oftentimes conflicting scientific literature as well.  I’m just giving generally readable and somewhat respectable accounts.  A lot of these issues remain controversial, and possibly there is some tobacco science going on, but its hard for us non-experts to really know.

Anyway, I hope from this you understand the complexity of trying to stay health, and trying to keep your brain functioning well in the modern world.

I’m probably a bit extreme but I say,

About eating and food:

  • Try and cook your own meals from real ingredients.  After a while, it becomes easy and its a great way to wind down with friends.
  • If someone’s great grandmother (anyone’s, Fiji, Vietnam, Sweden, …) didn’t make the food 100 years ago, its probably not good for you.
  • Don’t take dietary or health advice from Big Food.  In fact, looking at the government advice (listed above) on the flu, I’d say their’s is missing the major points too!
  • Try and avoid packaged meals, fast food, and canned and bottle drinks.
  • Go low sugar and healthy fats.  Its a lifestyle thing, not a diet.  Once you do, you’ll discover all the amazing subtle flavours you’ve missed from traditional foods and realise how horrible standard breads, sweet deserts, snack bars and cakes really are:  the sugar masks the real flavour, and it gives you a longer term bad after taste.
    • Healthy fats is challenging to maintain because Big Food likes to put unhealthy canola oil in everything:  most salad oils, hummus and deli mixes are mostly canola oil, as is margarine.
  • Just avoid artificial sweeteners.  Once you’ve gone cold turkey and got off the sugar addiction you wont be craving it and you’ll feel better for it.
  • Health slogans on food products, “low fat”, “low cholesterol” often mean its bad for you!   Low fat usually means high sugar, for instance.

About other aspects of health:

  • Get exercise, and make it a lifestyle thing.  When you’re older, you’ll discover you cannot function well as a knowledge worker without it.
  • Don’t sit at your desk for long hours.  You need to get up and move around every hour!  Also, become aware of your posture.   Don’t become a hunchback!  Some 2nd years are already heading that way.
  • When you’re mentally worn out, a quick nap or a brisk walk does wonders, and both have scientific backing.
  • Make sure you are getting proper sleep.  That can mean organising your assignments and study properly so you don’t need a to do a bunch of all-nighters to get through.  But it also means setting up the right environment at home for sleep.
  • I know of few cases where drugs and alcohol support good health or brain functioning, including so-called smart drugs or nootropics.  Most are dangerous to the liver, as are many medicines.  Headache and pain medicine is far more dangerous and damaging than many other things!
  • Routine … that’s what the body needs.  For sleep, for eating, for study, for exercise, routine is critical part of making it function well.

Anyway, I have said too much already.  In case you’re wondering, I am now on holidays.  No time for a Data Science professor to talk about this stuff during semester!

Advertisements
h1

Picking Conferences

January 7, 2018

As a PhD student starting out, you do have some career options.  Likewise, as a typical junior academic, with limited budget and research time, you have similar career options.  A main one which I’ll discuss here is:  Which conference(s) should I got to?  This is peculiar to computer scientists whose conferences are competitive publications (say 20-25% acceptance rate) and count as publications.

So you only get time to attend a few conferences.  Likewise, you only get time to write papers for a few.  So you want them to count.  Conferences each have their own style.  Best way to think of it is that a conference is a tribe where membership is part-time.  You have to take time to learn about the habits and preferences of the tribe, i.e., in terms of paper content.  If the tribe always starts off with 20% of detailed theoretical definitions then you have to as well.  If they do certain kinds of experiments, then so should you.  Think of these sorts of things as tribal markings.  To be innovative, you generally need to do so from inside the system.  I know this sounds conformist, and belief me, I am completely non-conformist myself, but generally its how conferences work, largely as a result of the reviewing system. If a trusted member of the tribe starts quoting classical, venerated philosophers, so will the others.  If a complete unknown person submits a paper quoting venerated philosophers, then it’ll be viewed as weird unless they have enough other tribal markings on their work to accepted.

I have a number of conferences I really like where I understand the general tribal markings and am happy to live with them.  So SIGKDD has solid experimental work, ICML has innovative new methods, ACL has applications of machine learning to real linguistic problems.  They sometimes have additional tribal markings that can be more or less problematic.

Anyway, as a junior academic, you have to target a few conferences and learn to become a reliable tribal member.  You might want to pick a few authors and build on their work.  Or you might want to pick a specialised problem.  Regardless, to publish in particular venues you’ll have to get to know the tribal preferences and adhere to them.  Doing good research is one thing, and really good research will usually speak for itself, but if your contribution is not outstanding, say “merely” at the top 25 percentile of work, then you have to follow the tribe to be accepted into the tribe.  That takes time.

Moreover, the vibe at the conference is always much, much more than the printed proceedings.  You need to be there:  hear the questions, watch the audience, chat to others in the breaks, see the quality of the presenters.  What is important and influential?  What is losing out, perhaps because it was trendy rather than productive?  All this happens at the conference.  You need to be there to see it.  Otherwise, you’ll be a year behind the others … new ideas for next year’s conference are often the germ of an idea at this year’s conference.  Moreover, it always helps to see the movers and shakers in action.  What sort of people are they?  How do they present their work?

So what does this mean to the junior academic?  You need early on to target a particular conference, subject or influential author’s/group’s body of work, and learn what it is they do.  That’ll take time.  So if you don’t see yourself as being involved in that community 5 years down the track, you probably shouldn’t be making that effort.  If you think their research doesn’t have a good future, then again, you probably shouldn’t be making that effort.  Pick some conferences with this in mind, and try and go along semi-regularly to keep track of things and pick up the vibe.

h1

Asian Conference on Machine Learning

November 11, 2017

Heading off to ACML in Seoul to present the paper “A Word Embeddings Informed Focused Topic Model” for PhD student He Zhao.  He is off elsewhere, at ICDM in New Orleans presenting another paper, “MetaLDA: a Topic Model that Efficiently Incorporates Meta Information”.  The MetaLDA algorithm incorporates Boolean side information, beating all others, and the newer WEI-FTM algorithm incorporates general side information but as a focused topic model.  He is a prolific coder, with some of his work on Github.

ACML is getting to be a great conference.  Always great invited talks and tutorials.  A worthy end of semester break for me.

Abstract
In natural language processing and related fields, it has been shown that the word embeddings can successfully capture both the semantic and syntactic features of words. They can serve as complementary information to topics models, especially for the cases where word co-occurrence data is insufficient, such as with short texts. In this paper, we propose a focused topic model where how a topic focuses on words is informed by word embeddings.  Our models is able to discover more informed and focused topics with more representative words, leading to better modelling accuracy and topic quality. With the data argumentation technique, we can derive an efficient Gibbs sampling algorithm that benefits from the fully local conjugacy of the model.  We conduct extensive experiments on several real world datasets, which demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic coherence, particularly in handling short text data.
Keywords: Topic Models, Word Embeddings, Short Texts, Data Augmentation

h1

Notes on Determinantal Point Processes

September 11, 2017

I’m giving a tutorial on these amazing processes while in Moscow.  The source “book” for this is of course Alex Kulesza and Ben Taskar’s, “Determinantal Point Processes for Machine Learning”, Foundations and Trends® in Machine Learning: Vol. 5: No. 2–3, pp 123-286, 2012.

If you have an undergraduate in mathematics with loads of multi-linear algebra and real analysis, this stuff really is music for the mind.  The connections and results are very cool.  In my view these guys don’t spend enough time in their intro. on gram matrices, which really is the starting point for everything.  In their online video tutorials they got this right, and lead with these results.

There is also a few interesting connections they didn’t mention.  Anyway, I did some additional lecture notes to give some of the key results mentioned in the long article and elsewhere that didn’t make their tutorial slides.

h1

Advanced Methodologies for Bayesian Networks

August 22, 2017

The 3rd Workshop on Advanced Methodologies for Bayesian Networks was run in Kyoto September 20-22, 2017. The workshop was well organised, and the talks were great. Really good invited talks by great speakers!

I’ll be talking about our (with François Petitjean, Nayyar Zaidi and Geoff Webb) recent work with Bayesian Network Classifiers:

Backoff methods for estimating parameters of a Bayesian network

Various authors have highlighted inadequacies of BDeu type scores and this problem is shared in parameter estimation. Basically, Laplace estimates work poorly, at least because setting the prior concentration is challenging. In 1997, Freidman et al suggested a simple backoff approach for Bayesian network classifiers (BNCs). Backoff methods dominate in in n-gram language models, with modified Kneser-Ney smoothing, being the best known, and a Bayesian variant exists in the form of Pitman-Yor process language models from Teh in 2006. In this talk we will present some results on using backoff methods for Bayes network classifiers and Bayesian networks generally. For BNCs at least, the improvements are dramatic and alleviate some of the issues of choosing too dense a network.

Slides are at the AMBN site, here.  Note I spent a bit of time embellishing my slides with some fabulous historical Japanese artwork!

Software for the system is built on the amazing Chordalysis system of François Petitjean, and the code is available as HierarchicalDirichletProcessEstimation.  Boy, Nayyar and François really can do good empirical work!

h1

Visiting and talks at HSE, Moscow

August 20, 2017
Visiting Dimtry Vetrov’s International Lab of Deep Learning and Bayesian Methods at the distinguished Higher School of Economics in Moscow from 11-15th September 2017.  What a great combination, Bayesian methods and deep learning!
HSE_Group_Grill_130917_small

The HSE group at Izia’s Grill, 13/09/17

Left to right are our host Prof Dimtry Vetrov, me, Iliya Tolstikhin, Novi Quadrianto, Maurizio Filippone and our cordinator Nadia Chirkova.  We four in the middle were the invited speakers for the workshop, Mini-Workshop: Stochastic Processes and Probabilistic Models in Machine Learning.  The invited talks by these guys where absolutely first class and the high quality of the Moscow area speakers made for a fascinating afternoon too.
Giving two talks, a tutorial one:   Introduction to Dirichlet Processes and their use, at the workshop.
Assuming the attendee has knowledge of the Poisson, Gamma, multinomial and Dirichlet distributions, this talk will present the basic ideas and theory to understand and use the Dirichlet process and its close relatives, the Pitman-Yor process and the gamma process.  We will first look at some motivating examples.  Then we will look at the non-hierarchical versions of the processes, which are basically infinite parameter vectors.  These have a number of handy properties and have simple, elegant marginal and posterior inference.  Finally, we will look at the hierarchical versions of these processes.  These are fundamentally different.  To understand the hierarchical version we will briefly review some aspects of stochastic process theory and additive distributions.  The hierarchical versions becomes Dirichlet and Gamma distributions (the process part disappears) but the techniques developed for the non-hierarchical process models can be borrowed to develop good algorithms, since the Dirichlet and Gamma are challenging when placed hierarchically.  Slides are here.
And one to the Faculty of Computer Science:  Learning on networks of distributions for discrete data.  The HSE announcement is here.
I will motivate the talk by reviewing some state of the art models for problems like matrix factorisation models for link prediction and tweet clustering.  Then I will review the classes of distributions that can be strung together in networks to generate discrete data.  This allows a rich class of models that, in its simplest form covers things like Poisson matrix factorisation, Latent Dirichlet allocation, and Stochastic block models, but more generally covers complex hierarchical models on network and text data.  The distributions covered includes so-called non-parametric distributions such as the Gamma process.  Accompanying these are a set of collapsing and augmentation techniques that are used to generate fast Gibbs samplers for many models in this class. To complete this picture, turning complex network models into fast Gibbs samplers, I will illustrate our recent methods of doing matrix factorisation with side information (e.g., GloVe word embeddings), done for link prediction, for instance, for citation networks.
h1

MDSS Seminar Series: Doing Bayesian Text Analysis

August 4, 2017

Giving a talk to the Monash Data Science Society on August 28th.  Details here.  Its a historical perspective and motivational talk about doing text and document analysis.  Slides are here.