Thursday, September 26, 2019

GANs and other deep learning models for cooking recipes

I retired this spring after working on artificial intelligence projects since the 1980s. Freedom from having to work on large projects for other people and companies is liberating and frees up time for thinking about new ideas. Currently I am most interested in deep learning models for generating and evaluating recipes - for now I am using a GAN model (which I am calling RecipeGAN).

When I managed a deep learning team at Capital One, I used GANs to synthesize data. During a Saturday morning quiet-time hacking sprint the first month at my new job, I had the idea to take an example program SimpleGAN that generated MINST digits and instead generate numeric spreadsheet data (using the Wisconsin Cancer Data Set that I had previously used in my books as example machine learning data). I was really surprised how well this worked: I could generate fake Wisconsin cancer data, train a classification model on the fake data, and get classification prediction accuracy on real data samples that was almost as good as a model trained on real data. This was by just making about 40 lines of code changes to the short SimpleGAN TensorFlow example/demo program. My team took this simple idea and built a robust production system around it that is well described in Austin Walter’s Medium article.

Several years ago, a fan of my web app gave me 100K public domain recipes in digital format so I should have ample training data for RecipeGAN. I will put the code and data on github when I am done with this experiment. If you are not familiar with Generative Adversarial Networks (GANs), in the cooking/recipe context the idea is simple enough: a generator model takes as input a random vector (referred to as Z vector, or latent input) and generates random recipes (for now represented as sparse vectors indicating the use of ingredients). A discriminator model learns to tell the difference between fake ingredient lists generated by the generator and real ingredient list samples. Both models are trained jointly so the generator learns to better fool the discriminator model while the discriminator model learns to not be fooled. When this process is done, the discriminator model is no longer needed. New random latent Z input vectors fed as input to the generator model hopefully generate realistic ingredient lists.

I am also interested in language generation and an end goal for my current research is to generate English directions for making the fake recipes (the ingredient lists created by the RecipeGAN generator model). This is a fun project and I also hope that the code and data will be useful to other people, even if I don’t get good results. Indeed, I am writing this blog now to encourage myself to share results no matter how well the system works. Ideas are meant to be shared.

BTW, please don’t take my proclamations of being retired too seriously. I am still helping people, as a consultant, get started on deep learning projects.

Friday, August 09, 2019

Back living in Sedona Arizona and enjoying my retirement

My wife and I returned to our home in Sedona Arizona in June. I had been managing a deep learning team for Capital One in Champaign Illinois (in the research park at UIUC). I am now retired so we moved back into our house in the mountains in Central Arizona.

re: retirement: while I will might still do small interesting consulting jobs, I am retired. I am spending my time volunteering at a local food bank, hiking and kayaking with my friends, and I joined a local writers group to give myself a shove to finish a sci-fi book I have been working on for a long time.

I released a second edition to my Haskell book this week and I have edits for a new edition for my Common Lisp book that I will push to current readers soon, but I plan on no longer writing new technical books. I have written 22 technical books - probably sufficient :-)

Personally my passion is still studying artificial intelligence and deep learning but this is now research for my personal pleasure.

Saturday, May 18, 2019

My large Haskell + Python project KGcreator (tool for automating the generation of Knowledge Graphs) and auto code formatting

You might wonder what the topics of my large Haskell + Python project KGcreator and auto code formatting have to do with each other.

I addition to working on two Python books (Python Intelligent Systems and Deep Learning and Graph Databases), my main 'retirement' activity has been write a lot of Haskell code and a smaller amount of Python code for my KGcreator project. After reading a discussion on Hacker News yesterday about Python code tidy/auto-format tools, I decided to add Makefile targets.

After a 'stack install stylish-haskell hindent' and a 'pip install yapf', I added something like this to my Haskell top level Makefile:

  cd src/fileutils; stylish-haskell -i *.hs; hindent *.hs
  cd src/nlp; stylish-haskell -i *.hs; hindent *.hs
  cd src/sw; stylish-haskell -i *.hs; hindent *.hs
  cd src/webclients; stylish-haskell -i *.hs; hindent *.hs
  cd test; stylish-haskell -i *.hs; hindent *.hs

And something like this to my Python top level Makefile:
  cd botorch_bayesian_optimization; yapf *.py --style='{indent_width: 2}' -i
  cd coref_anaphora_resolution_web_service; yapf *.py --style='{indent_width: 2}' -i
  cd data_fusion; yapf *.py --style='{indent_width: 2}' -i
  cd deep_learning_keras; yapf *.py --style='{indent_width: 2}' -i
  cd deep_learning_pytorch; yapf *.py --style='{indent_width: 2}' -i
  cd discrete_optimization; yapf *.py --style='{indent_width: 2}' -i
Trivial stuff, but I already find my KGcreator and my books' codebases easier to work with. For Common Lisp and Scheme I always just rely on using the tab character to auto-indent and leave it at that. I use VSCode for both Haskell and Python development and after experimenting with a few extensions I decided it was easier to add a make target. Nothing is automated right now, I 'make tidy', 'make tests', and then 'git commit...' manually. Still to be done is adding git commit hooks. Fortunately I can use notes in one of my old blog posts as a guide :-)

Saturday, March 23, 2019

I retired from Capital One yesterday

With deep gratitude for a great company and a great job, I retired from my role as manager of the UIUC machine learning team and Master Software Engineer. Capital One has deep machine learning talent so check them out if you are looking for ML work.

Thanks especially to my team for being interesting to work with and for the kind going away gift of locally made Go Ban board, bowls, and Go stones. A wonderful gift. I will miss you all!

When my family and friends hear me talk about retirement they do so with great skepticism since I have retired several times already! That said, I feel like kicking back and finishing my current book project and perhaps do limited consulting work after I travel a bit to see family and friends.

Sunday, February 10, 2019

Full circle from one laptop to rule them all to specialized function specific devices

For about 25 years my digital life was wrapped tightly around whatever personal laptop I had. Since for most of that time I worked as a remote consultant (except for gigs at Singapore-based Ola Search, Google in Mountain View, and currently at Capital One in Urbana, Illinois) my personal laptop also covered work activities. There was something close and comforting about having one digital device that I relied on.

Digital life is very different now. Because of concerns about ‘always being online’ and not paying enough attention to the non-digital world, I favor just wearing an Apple Watch and leaving my iPhone at home. The Apple Watch is just adequate enough for phone calls, messaging, and on rare occasions email and is not anything I spend any real time paying attention to. I can spend the good part of a  day shopping, walking in a park, eating out, or perusing books in a library and just spend a few minutes paying attention to my watch. A huge improvement to cellphone addiction!

For work, I have dedicated secure devices for getting my work done - the definition of purpose-specific.

For home use, I have a powerful GPU laptop from System76 that I only use for machine learning and experiments I am doing fusing ‘classic’ symbolic AI with functional components that are just wrappers for deep learning models.

Also for home use I have a MacBook that is primarily used for long writing sessions when I am working on a book project. Example code for my books tends to be short and pendantic so that development lives on the MacBook also.

I depend on my iPhone when I travel to stay organized and to have local copies of required digital assets, including on-device cached Netflix movies, Audible audio books, and Kindle books.

Lastly, the device that I spend more time on than any other (except for work devices) is my iPad on which I do close to 100% of my web browsing, almost all of my reading, enjoying entertainment, and lots of light weight writing like this blog post and editing and small additions to my current book project.

If I count all cloud-based compute infrastructure for work as one huge virtual device, the count for the digital devices I use every week weighs in at eight devices. When I retire from my job at Capital One later this spring that device count falls to five devices - still really different than the old days of having one laptop for everything.

Looking ahead to the future, perhaps only 5 or 10 years from now, I expect device profiles used by typical consumers to change a lot - mostly being one personal device that is always with you and then many different peripheral and possibly compute devices in your living and working environments that are shared with other people. I think there are three possibilities for what the one personal device may be:

  1. A smartphone 
  2. Something like an Apple Watch
  3. Something like a one-ear only AirPod like device

Whatever the profile is for your personal digital device, it will securely be connected to all shared devices (e.g., smart TVs, shared keyboards and monitors, shared tablets of all sizes, smart cars, home entertainment centers, the cell phone network infrastructure, point of sale devices in stores, etc.).

Tuesday, February 05, 2019

AWS Neptune Graph Database as a service

Fascinating to see Amazon AWS supporting graph databases with their Neptune service - I have been working as a machine learning practitioner at Capital One (I manage a machine learning team there) but in a previous life, I worked with the Knowledge Graph when I was a consultant at Google and I have written a few semantic web/linked data books.

As a side project at home I have been looking into Knowledge Graph building tools so Amazon;s new offering looks useful! I like that they support both SPARQL and and Gremlin for queries.

Sunday, January 27, 2019

Our humanity vs. technology and corporatism

My wife and I enjoyed a performance of Sleeping Beauty by the Russian National Ballet Theater last Wednesday night at a theater on campus at UIUC. Every time I enjoy art, company of family and friends, reading a good book, cooking and enjoying a meal, etc. I appreciate being a human (i.e., a somewhat evolved great ape) and my physical and social life.

I view technology as a fairly neutral force in our lives. I judge technology on how it improves peoples' lives, health, the health of our planet, and generally how well it supports civil society. As technologists, we get value from being paid for our work and thus helping to support ourselves and our families and to spend money in our local economies (supporting local businesses and directly or indirectly hiring people working in our communities.) We also benefit from any pleasure we get learning new things while working. There are obvious bad aspects of technology and these bad aspects are mostly aligned with corporatism.

Whether or not you believe in big government or small government, I argue that an important function of government is to provide some checks and balances to corporations. Corporations by design are systems for making money for owners/shareholders. While this can positively affect society there are too many cases where things go wrong: lobbying and perverting democratic systems of government, extracting too much value out of local communities, and centralizing economic power and control - killing off smaller and potentially better rivals.

As technologists I think we can find a good balance between earning a good living to support ourselves, family, and community and also as much as possible choosing to work for organizations and projects that have a net benefit to overall society. I look at this in a way that is analogous to a "carbon tax" for technologies we create and use. Let's call it a "technology value tax" where we try to at least make our work "carbon neutral."

Thursday, January 24, 2019

Ocean Protocol Meetup

Originally posted January 9, 2019

I hosted a meeting today to talk about Ocean Protocol, other data sources for machine learning, and lead a group discussion of startup business ideas involving curating and selling data. The following is from a handout I created from material on the Ocean Protocol web site and other sources:

Data Trumps Software

Machine learning libraries like TensorFlow, Keras, PyTorch, etc. and people trained to use them have become a commodity. What is not a commodity yet is the availability of high quality application specific data.

Effective machine learning requires quality data

  • Ocean Protocol - is a ecosystem based on blockchain for sharing data that serves needs for both data producers who want to monetize their data assets and for data consumers who need specific data that is affordable. This ecosystem is still under development but there are portions of the infrastructure (which will all be open source) already available. If you have docker installed you can quickly run their data marketplace demonstration system
  • Common Crawl - is a free source of web crawl data that was previously only available to large search engine companies. There are many open source libraries to access and process crawl data. You can most easily get started by downloading a few WARC data segment files to your laptop. My open source Java and Clojure libraries for processing WARC files are at
  • Amazon Public Dataset Program - is a free service for hosting public datasets. AWS evaluates applications to contribute data quarterly if you have data to share. To access data sources search using the form at https://registry.opendata.awsto find useful datasets and use the S3 bucket URIs (or ARNs) to access. Most data sources have documentation pages and example client libraries and examples.

Overview of Ocean Protocol

Ocean Protocol is a decentralized data exchange protocol that lets people share and monetize data while providing control, auditing, transparency and compliance to both data providers and data consumers. The initial Ocean Protocol digital token sale ended March 2018 and raised $22 million. Ocean Protocol tokens will be available by trading Ethereum Ether and can be used by data consumers to purchase access to data. Data providers will be able to trade tokens back to Ethereum Ether.


  • Publisher: is a service that provides access to data from data producers. Data producers will often also act as publishers of their own data.
  • Consumer: any person or organization who needs access to data. Access is via client libraries or web interfaces.
  • Marketplace: a service that lists assets and facilitates access to free datasets and datasets available for purchase.
  • Verifier: a software service that checks and validates steps in transactions for selling and buying data. A verifier is paid for this service.
  • Service Execution Agreement (SEA): a smart contract used by providers, consumers, and verifiers.

Software Components

  • Aquarius: is a service for storing and managing metadata for data assets that uses the off-chain database OceanDB.
  • Brizo: used by publishers for managing interactions with market places and data consumers.
  • Keeper: a service running a blockchain client and uses Ocean Protocol to process smart contracts.
  • Pleuston: an example/demo marketplace that you can run locally with Docker on your laptop.
  • Squid Libraries: client libraries to locate and access data (currently Python and JavaScript are supported).

Also of interest: SingularityNET is a decentralized service that supports creating, sharing, and monetizing AI services and hopes to be the world’s decentralized AI network. SingularityNET was started by my friend Ben Goertzel to create a marketplace for AI service APIs.

Internet As Entertainment Vs Information Vs Knowledge

Originally posted December 8, 2018

We can look forward to a future where the economy wrapped around tech advances overshadows conventional industries like agriculture and manufacturing. Given this context I am disappointed but not surprised that on international math tests students in the USA continue to fall behind their counterparts in the rest of the world.
Why is this when there are so many opportunities to learn both in school and as a lifetime pursuit?
Given the transformational effect of the Internet on society in general and in particular the economy, I think we are seeing the effects of different peoples’ perception and use of the Internet as a source of entertainment vs. source of information vs. source of knowledge.

Mark’s Hierarchy of Internet Use

Simplifying this discusion, in increasing order of personal value and value to society, Internet use falls in three broad use cases:
  • Internet as a source of entertainment: there is real value in engaging with friends and family together playing online games and enjoying at anytime ‘binging’ on Netflix.
  • Internet as a source of information: this is a real economic driver as we look to web search engines for directions for fixing our cars, finding what an compiler or runtime error message means and how other people have fixed the coding problem, find a template for giving a presentation at work, finding salary ranges for your job in the city you live in, etc.
  • Internet as a source of knowledge: knowledge is ‘digested’ and processed information usually based on experience and interactions with other people. The highest value sources of knowledge on the web are quality online courses taught by the top people in a field of study, videoed conversations by industry or academic ‘thought leaders’, etc. In other words, using the Internet to access the thoughts and knowledge of the brightest people on the planet.
Let’s go back in time to the 1800s to look at the American essayist, lecturer, philosopher, and poet Ralph Waldo Emerson. I like to think of Emerson as Internet or Web version 0.01. In the 1800s wealthy people had libraries and could tap into current knowledge and philosophies but for the common person, a yearly visit of Emerson to your town gave you an opportunity to hear a well thought out digest of currently modern thought and news. The benefit for Emerson was not only the money he was paid for his lectures but also the stories he heard and the conversations he had gave him new ‘material’ to work with. It was a virtuous circle. For the people in a town they not only heard news and information but also knowledge of how the world worked (at least according to Emerson) and potentially changed the way they looked at their own lives and ambitions.
Fast forward to present times: we have great conversations between thought leaders (a good example being Lex Fridman’s artificial general general intelligence course at MIT that is comprised of interviews) and often realtime video feeds of talks given at international conferences. While I am an avid reader (I read about one book a week), I like to hear people talk, even better on a video feed.

How can Information Collecting Software Agents Tap into knowledge? A path to AGI?

Current state of the art for automatically collecting concrete information of the web continues to rapidly improve, noteworthy projects using deep learning sequential models for language modeling, question answering systems, and most recently the BERT project solves difficult natural language processing (NLP) tasks like coreference detection (also known as anaphora resolution), predicting the probability of one sentence following another during discourse, predicting missing words in a sentence, text classification, translation between languages, etc.
While current state of the art allows collecting and using information from the Internet, how can information collecting software agents tap into knowledge, like discussions between top technologists? I think this is a step towards developing artificial general intelligence (AGI) with a very incomplete set of sub-goals being:
  • Identify the experts in the world for any specific technology
  • Collect written works, lectures, and debates and conversations with other experts
  • Build models for both raw information experts know and also their analytic and thought processes.
Yeah, it is the third step that might take decades or even a hundred years. While my personal interests are in NLP and knowledge representation, the other crucial part of building AGI technology is physical embodiments. Robots! I think that the deep models for functioning in a physical world can for the foreseeable future be worked on independently from NLP and general knowledge processing, but once these problems are solved, the achieving AGI will be a form of transfer learning by building joint models using pre-trained living in the physical world models combined with separately trained knowledge models.
It is early days for deeplearning but one lesson can learned by looking at public models in so-called ‘model zoos’ (for example, for TensorFlow, PyTorch, the Julia framework Flux, etc.) and noticing that current advanced capability models are generally no longer simple linear models but rather have complex architectures of sub-models that are trained jointly. Although I expect major “non deep learning” breakthroughs towards AGI using Bayesian Graph Models and other techniques not yet invented, the real lesson from deep learning is that complex predictive and functional behavior is achieved by combining models so I expect AGIs to use many different technologies, probably developed fairly independently, and then trained and tuned jointly.

Using Trained Keras Weights In Racket Scheme Code

Originally posted September 8, 2018

I am interested in using pre-trained deep learning models as functional components in functional languages like Scheme and Haskell.
I want to share a simple experiment that I wrote that uses Keras to train a model on the Wisconsin cancer data set (that I have used in the last three years in two books I have written in example programs), saves the weights in CSV files, and then uses those weights in a Racket Scheme program. There are two github repos:
Sample run:
$ racket neural.rkt 
** weights loaded **
shape of w2:
(number correct: 139)(number wrong: 12)(accuracy: 92.05298013245033)

Centralized Vs Distributed Systems Publishing And Owning Our Own Content

Originally posted August 25, 2018

I read a good article on Centralized Wins. and Decentralized Loses. this morning and I commented on Hacker News:
I wish it weren’t true but for right now I agree with the author, including what it might take to tip the balance: “1. Complete deterioration of trust such that avoiding the centralization of power becomes a necessity. 4. The decentralization community manages to create clearly superior applications as convenient and reliable as centralized providers.“
I was eager to use GNU Social, and after some use for a year, my host shut down. I just opened another Mastadon account but haven’t started using it yet. Also, the value of centralized services like Twitter is the really interesting people I follow. Social media is best when used as an advertising and directory service for content we put on our own blogs and web sites, but even that seems to be less common.
I really enjoyed the Decentralized Web Conference June 2016, but it also made me slightly sad that fighting back against centralized web, political, corporate, etc. centralization is an uphill fight.
Sometime network power laws suck.

Fightly back against centralization: owning and publishing our own content

It is a source of some frustration for me that so many people I know don’t take the effort to own their own domain and maintain their own web site and blog. Owning your own content and “publishing” platform has many long tail advantages. What these advantages are depends a lot on what your business and personal interests are. For me, owning my own content and personal brand has made the following possible:
  • Platform to promote the books I write
  • I have found interesting work (mostly remotely living in the mountains in Sedona Arizona) when the following companies reached out to me to do work for them: Google, Capital One, CompassLabs, Disney, SAIC, Americast, PacBell, CastTV, Lutris Technology, Arctan Group,,, and Webmind Corporation.
Ideally kids would learn in school how to have a web presense they control, in addition to tech training. I would say that maintaining a proper web presense is like knowing how to drive a car but I would guess that many kids today will never learn to drive a car. But they should learn to properly use the web for their own benefit.

Hybrid Artificial Intelligence Systems

Originally posted August 19, 2018

Even though I specialize in deep learning at work I sceptical about achieving AGI (artificial general intelligence) using only neural networks (of any architecture). My friend and two time colleague Ben Goertzel has been working on a hybrid approach to AGI in the OpenCog organization with a large team for many years.

In my personal-time project, I have been working on my own hybrid general AI framework for a number of years. Common Lisp is still my favorite language for research programming and the latest edition of my book Loving Common Lisp, or the Savvy Programmer’s Secret Weaponcontains a section on using Armed Bear Common Lisp With DeepLearning4j. ABCL is implemented in Java and while not as performant as SBCL it offers interop with Java libraries like DeepLearning4j. For now, I am happy enough being able to train specific deep learning models for word2vec, seq2seq, and summarization and write functional wrappers around these models for easy access in Common Lisp code. Probably not a popular opinion but I also believe that symbolic systems (rules) and Bayesean graph representation and inferencing are (probably) necessary parts of an AGI development platform.
I use TensorFlow and Keras at work but I have experimented sufficiently with DeepLearning4j for several years and it is fine for my purposes. I did also experiment using Google’s unofficial Haskell bindings for TensorFlow and seriously considered using Haskell for my side project AIsentience.netbut, frankly, I have much more experience using Common Lisp. For people with more Haskell experience, Haskell is a fine language for research programming.

My Summer in Illinois

It has been a year since Carol and I moved to Urbana/Champaign Illinois and I started a job at Capital One managing a machine learning team at the UIUC Research Park. Living in a university town is great - lots to do, but we still look forward to returning to our home in Sedona Arizona in (about)a year. I have spend 90 minutes on the phone this weekend with old friends in Sedona and look forward to seeing them again in person.
In addition to work I have re-started a very old hobby: playing the game of Go. I had not played a serious tournament game since 1980 but after starting to play at a local club at the university I spent a week in Williamsburg Virginia in July playing in the US Go Open Tournament. I am starting study with a professional Go player who runs the American Yunguseng Dojang - The online Go School.

My Emacs setup for Org mode, flyspell, and git

Originally posted June 2, 2018

I do much of my work in SSH shells to remote servers and while I have used Emacs for about 30 years I don’t usually take too much time and effort customizing it. I have recently started using Org mode at work (I am a Master Software Engineer at Capital One and manage a Deep Learning team) and at home. Today after enjoying a Saturday morning gym workout I decided to improve my Emacs setup and this blog article is documentation that I will use in the future for reference. If it saves you some time or gives you useful ideas, even better!
I wanted to enable flyspell (which is installed with any current Emacs distribution) for all text modes, and that includes Org mode files. I added the following to my .emacs file:
(add-hook 'text-mode-hook 'turn-on-visual-line-mode)

;; flyspell                                                                                                   
(dolist (hook '(text-mode-hook))
  (add-hook hook (lambda () (flyspell-mode 1))))
I like to augment the Aspell spelling dictionary with words I commonly use that are not in the default dictionary. As an example, I am going to configure Aspell with a custom dictionary and add the word “Keras” and “TensorFlow: by creating a file ~/.aspell.en.pws that contains:
personal_ws-1.1 en 0
Now when your local Emacs runs Aspell, flyspell will pick up the words in your custom dictionary.
I love Org mode, especially with SrSpeedBar, to quickly navigate and use Org to organize my day, take project notes, etc.
I like to use SrSpeedbar when in window mode (i.e., when running Emacs locally in its own window) but not inside a terminal or SSH session. I add the following to my .emacs:
(load "~/.emacs.d/sr-speedbar.el")
(require 'sr-speedbar)
    speedbar-use-images nil
    speedbar-show-unknown-files t
    sr-speedbar-right-side t)
(when window-system
(setq sr-speedbar-width 40)     ;; doesn't work for me
(setq sr-speedbar-max-width 50) ;; doesn't work for me
I usually run package-list-package to find and install Emacs packages but I load sr-speedbar.el from a downloaded source file.
I sometimes use magit for using git inside my Emacs environent but I usually just run git on the command line.
Magit is probably already installed with your Emacs but if not you can use package-list-package (or however else you manage Emacs packages) to instal it.
So far I have not customized my magit setup at all, not even any key bindings. I use M-x magit- followed by the tab character to see possible completions. Usually I just use M-x magit-status to review version histories. Magit is a deep tool and I am getting used to it slowly, learnig features as I need them.

Friday, May 18, 2018

I have removed blog comments to support GDPR compliance

One of the great things about writing a blog is interacting with users through the comments they leave on my blog.

At least for now, I have disabled comments on my blog. Sorry about that!

Saturday, April 28, 2018

Is too much emphasis in Artificial Intelligence given to Deep Learning?

Deep learning has revolutionized the way we do Natural Language Processing, image recognition, translation, and other very specific tasks. I have used deep learning most days at work for about four years. Currently, I do no work in image recognition but I still use convolutional networks for NLP, and in the last year mostly use RNNs and GANs.

While I agree that deep learning is important in many practical use cases, a lot of data science still revolves around simpler models. I feel that other important techniques like probabilistic graph models (PGM) and discrete optimization models (the MiniZinc language is a good place to start) don't get the attention in universities and industry that they deserve.

On a scale of 1 to 10, I estimate the hype level of deep learning to be approximately 15.

I started working in the AI field in 1982 (back then, mostly "symbolic AI" and neural networks) and to me artificial intelligence has always meant a very long term goal of building flexible intelligent systems that can learn on their own, be full partners with human knowledge workers, and reliably take over job functions that can be safely automated.

I don't know what will get us to my (and many other people's) view of truly flexible and general purpose intelligent systems. My gut feeling is that a solution will require many techniques and technologies, perhaps including:
  • Efficiency and scalability of differentiable models running on platforms like TensorFlow
  • Formal coding of "common sense knowledge." There have been good attempts like ConceptNet and Cyc/OpenCyc but these are first steps. A more modern approach is Knowledge Graphs, as used at Google (what I used when I was a contractor at Google), Facebook, and few other organizations that can afford to build and maintain them.
  • A better understanding of human consciousness and how our brains work. The type of flexible intelligence that is our goal does not have to be engineered anything like us but brain inspired models like Hierarchical Temporal Models are useful in narrow application domains and we have to keep working on new model architectures and new theories of consciousness.
  • Faster "conventional" hardware like many-core CPUs and GPU like devices. We need to continue to lower hardware price and energy costs, increase memory bandwidths.
  • New hardware solutions like quantum systems and other possibilities likely no one has even imagined yet.
  • Many new ideas, many new theories, most not leading to to our goal.

Given my personal goals for flexible and general AI, you can understand how I am not pleased with too much emphasis on deep learning models and not optimizing our engineering efforts enough for our longer term goals.

Sunday, April 01, 2018

Java JDK 10

Java 10 was released for general availability a week ago.  I just installed it on macOS from I un-tar'ed the distribution, set JAVA_HOME to the Home directory in the distribution and put Home/bin first on my PATH. I used Java 10 with the latest version of IntelliJ with no problems: opened an existing Java project and switched the JDK to the newly installed JDK 10.

There are several nice features but the only ones I am likely to use are:

  1. Local variable type inference
  2. Better Docker container awareness in Linux
  3. Improved repl support

I have mixed feelings about the rapid new 6 month release cycles, but I understand the desire to compete with other JVM languages like Kotlin, Scala, Clojure, etc.

I have updating both of my Java books (Power Java and Practical Artificial Intelligence Programming With Java) on my schedule for the next 18 months. Java 11 is slated for release September 2018 and I will probably use Java 11 (whatever it will be!) for the updated books sine Java 11 will be a long term support release.

Sunday, February 11, 2018

Trying out the Integrated Variants library to explain predictions made by a deep learning classification models

"Explainable AI" for deep learning is required for some applications, at least explainable enough to get some intuition for how a model works.

The recent paper "Axiomatic Attribution for Deep Networks" describes how to determine which input features have the most effect on a specific prediction by a deep learning classification model. I used the library IntegratedGradients that works with Keras and another version is available for TensorFlow.

I modified my two year old example model using the University of Wisconsin cancer data set today. If you want to experiment with the ideas in this paper and the IntegratedGradients library, then using my modified example might save you some time and effort.

Sunday, December 10, 2017

Scheduling machine learning jobs to run in sequence

This might save you a few minutes of research time: I sometimes need to set up a number of Keras (or TensorFlow) runs to occur in sequence to run overnight, while I am away from work, etc. I don't want the processing to stop if any single job fails.

I use Task Spooler that is in Ubuntu and other Linux distros and can be installed on MacOS using "brew install task-spooler". Note, on Ubuntu, the command is tsp

You can schedule any shell command command to run by prepending "ts". Examples:

cd run1
ts python
cd ../run2
ts python
ts # get a list of all queued, running, and finished processes
ts -c 3 # get the stdout for process 3
ts -t 3 # get the tail output for process 3
ts -C # clear the list of finished jobs

This simple setup is not really appropriate for doing hyper parameter tuning but is useful to set up a series of runs.

Sunday, December 03, 2017

Forget the whole world and focus on your world

Some people dream of “making it big,” dreaming of starting the next Facebook or Google. Poor people fantasize about becoming very wealthy. I think this is misplaced focus.

I prefer to live in and think about a much smaller world:

  • Being of service to family and friends and enjoying their company 
  • Being of value to coworkers and customers
  • Providing value and entertainment to people who read my books
  • Getting a lot of exercise and eating great food, in small portions
  • Enjoy reading great books and watching great movies

Yes, that is enough for me. I leave changing the world for other people.

Thursday, November 23, 2017

Easy Jupyter notebook setup on AWS GPU EC2 with machine learning AMI

The Amazon machine learning AMI (link may change in the future) is set up for CUDA/GPU support and preinstalled: TensorFlow, Keras, MXNet, Caffe, Caffe2, PyTorch, Theano, CNTK, and Torch.

I chose the least expensive g2.2xlarge EC2 instance type with a GPU and used the One Click Launch option (you will need to specify a key file pem file for the AWS region where you are starting the instance). to have an instance running and available in about a minute. This GPU instance costs $0.65/hour so remember to either stop it (if you want to reuse it later and don't mind paying a small cost of persistent local storage) or terminate it if you don't want to be charged for the 60GB of SSD storage space associated with the EC2.

I am very comfortable working in SSH shells using Emacs, screen, etc. When an instance boots up, the Actions -> Connect menu shows you the temporary public address which you can use to SSH in:

ssh -i "~/.ssh/key-pair.pem"

I keep my pem files in ~/.ssh, you might store them in a different place. If you haven't used EC2 instances before and don't already have an pem access files, follow these directions.

Anaconda is installed so jupyter is also pre-installed and can be started from any directory on your EC2 using:

jupyter notebook

After some printout, you will see a local URI to access the Jupyter notebook that will look something like this:

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:

In another terminal window start another SSH session but this time map the local port 8888 to port 8888 on the EC2:

ssh -L 8888: -i ".ssh/key-pair.pem"

Now on your laptop you can attach to the remote Jupyter instance using (your token will be different):

Alternative to using SSH tunnel:

A nice alternative is to install (on your laptop - no server side installation is required) and use sshuttle. Assuming I have a domain name attached to the sometimes running EC2, I use the following aliases in my bash configuration file:

alias gpu='ssh -i "~/.ssh/key-pair.pem" ec2-user@MYDOMAIN.COM'
alias tun="sshuttle -r ec2-user@MYDOMAIN.COM 0/0 -e 'ssh -i \".ssh/key-pair.pem\"'"

Note: Keeping an Elastic IP Address attached to a EC2 when the EC2 is usually not running will cost you about $3.40/month, but I find having a "permanent" IP address assigned to a domain name is convenient.

Goodies in the AWS machine learning AMI:

There are many examples installed for each of these frameworks. I usually use Keras and I was pleased to see the following examples ready to run:

There are many other examples for the other frameworks TensorFlow, MXNet, Caffe, Caffe2, PyTorch, Theano, CNTK, and Torch.

Sunday, November 12, 2017

My blockchain side project to 'give something back'

I am very busy with my new machine learning job but I always like to try to split off some of my own free time for occasional side projects that I think will help me learn new technologies. My latest side interest is in blockchain technologies and specifically I am interested in blockchain as a platform and environment for AI agents.

I liked Tim O’Reilley’s call for action for corporations and people to take a longer term view of working for things of long term value to society in his recent keynote speech: Our Skynet Moment 

While I consider myself to be a talented practitioner for building machine learning and general AI applications since 1982, I don't feel like I work at the level of creating any groundbreaking technologies myself. So, as far as 'giving something back' to society, it seems like my best bet is in putting some energy into distributed systems that push back against centralized control by corporations and governments, things that enpower people.

Although it is really early days, I think that the Hyperledger projects look very promising and I like how this organization is organized in a similar fashion as the Apache Foundation.

I would like to start slow (I don't have much free time right now!) and will record any open source experiments I do at my new site I may or may not finish a book project on any open source software that I write: hyperledgerAI: Writing Combined Blockchain and Artificial Intelligence Applications. I will start with small well documented sample applications built on Hyperledger.

Thursday, September 28, 2017

New job and two deep dives into tech

I haven't written a public blog post in four months because I have been busy moving to another state and starting a new job at Capital One (Master Software Engineer, role is tech lead and manager of a small machine learning team). Life has been really good: excitement of new job challenges and Carol and I have been enjoying the university town of Urbana/Champaign Illinois.

I am also finishing up two course specializations at Coursera: Probabilistic Graph Models and Deep Learning. The deep learning class series is just a review for me, and in contrast I find the PGM class series very challenging (I am taking these PGM classes at a slow and relaxed pace - Coursera lets you split classes to multiple sessions).

I am reading two books that I can highly recommend: Fran├žois Chollet's book "Deep Learning with Python" that serves as an idea book for advance use of deep learning using Keras. Fran├žois is the creator of the Keras framework and his new book is a treasure store of new ideas and techniques.

I am also (slowly) reading "Deep Learning" by Goodfellow, Bengio, and Courville. This book aims to be a self-contained reference for deep learning, including a very good prelude containing all the math you need to understand the rest of the book.

I have been taking pictures in the area around Urbana/Champaign. Yes, it is flat here but still beautiful and a nice place to live. I will post some pictures soon.

Tuesday, May 23, 2017

I updated my Natural Language Processing (NLP) library for Pharo Smalltalk

I have recently spent some time playing around in Pharo Smalltalk and in the process made some improvements to my NLP library: I changed the license to MIT and added summarization and sentence segmentation. Older code provides functionality for part of speech tagging and categorization.

Code, data, and directions are in my github repository nlp_smalltalk.

My first experience with Smalltalk was early 1983. The year before my company had bought a Xerox 1108 Lisp Machine for me and a Xerox technical sales person offered me a one month trial license for their Smalltalk system.

Pharo Smalltalk very much impresses me both for its interactive programming environment and also for the many fine libraries written for it.

I don't spend much time programming in the Pharo environment so I am very far from being an expert. That said, I find it a great programming environment for getting code working quickly.