CNK's Blog

MySQL Character Set Configuration

I want to set up MySQL server and all to it connections to it to use utf8. For work, we are using MySQL server 5.5 and RHEL 6.5 (or the equivalent CentOS on our Vagrant VMs).

Step 1. What is it currently using?

    mysql> show variables like 'char%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | latin1                     |
    | character_set_connection | latin1                     |
    | character_set_database   | latin1                     |
    | character_set_filesystem | binary                     |
    | character_set_results    | latin1                     |
    | character_set_server     | latin1                     |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.00 sec)

    mysql> show variables like '%collation%';
    +----------------------+-------------------+
    | Variable_name        | Value             |
    +----------------------+-------------------+
    | collation_connection | latin1_swedish_ci |
    | collation_database   | latin1_swedish_ci |
    | collation_server     | latin1_swedish_ci |
    +----------------------+-------------------+
    3 rows in set (0.00 sec)

Step 2. Changing the server’s character set.

The my.cnf file has a number of sections:

    mysqld_safe
    mysqld
    mysql
    client

Settings in the [mysqld] section apply to the server. Settings in the [mysql] section apply to the mysql command line client. And settings in the [client] section apply to all clients. Since I am mainly trying to change how the server and our python (or ruby) clients interact with the database, I want to make changes in the [mysqld] and [client] sections.

If I add default-character-set = utf8 to the [mysqld] section, and restart the server, my settings are now:

    mysql> show variables like 'char%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | latin1                     |
    | character_set_connection | latin1                     |
    | character_set_database   | utf8                       |
    | character_set_filesystem | binary                     |
    | character_set_results    | latin1                     |
    | character_set_server     | utf8                       |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.00 sec)

    mysql> show variables like '%collation%';
    +----------------------+-------------------+
    | Variable_name        | Value             |
    +----------------------+-------------------+
    | collation_connection | latin1_swedish_ci |
    | collation_database   | utf8_general_ci   |
    | collation_server     | utf8_general_ci   |
    +----------------------+-------------------+
    3 rows in set (0.01 sec)

When I create a new database without specifying a character set, it is created with server’s default, UTF8. That’s good. But the client and connection character set still say latin1.

    mysql> create database project_database;
    Query OK, 1 row affected (0.00 sec)

    mysql> SELECT schema_name, DEFAULT_CHARACTER_SET_NAME,
    DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
    +--------------------+----------------------------+------------------------+
    | schema_name        | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
    +--------------------+----------------------------+------------------------+
    | information_schema | utf8                       | utf8_general_ci        |
    | mysql              | utf8                       | utf8_general_ci        |
    | project_database   | utf8                       | utf8_general_ci        |
    +--------------------+----------------------------+------------------------+
    3 rows in set (0.00 sec)

Step 3a. Setting the connection characterset explicitly

I can explicitly set the character set for character_set_client, character_set_connection, and character_set_results after I have logged into the database but doing either: mysql> SET NAMES 'utf8'; OR mysql> charset utf8

In either case, all of my character set variables are now utf8. With ‘set names’, the character set reverts to the default ‘latin1’ if the client has to reconnect but with the ‘charset utf8’ command, the change survives needing to reconnect. (Note the syntax differences betweeen the 2 commands - set names requires quoting ‘utf8’ and a semicolon to execute. Charset does not require either.)

    mysql> show variables like 'char%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | utf8                       |
    | character_set_connection | utf8                       |
    | character_set_database   | utf8                       |
    | character_set_filesystem | binary                     |
    | character_set_results    | utf8                       |
    | character_set_server     | utf8                       |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.00 sec)

But both of those commands have to be issued after the connection is established - and need to be repeated each time I connect. I want something more automatic.

Step 3b. Changing the client / connection character set.

It looked like I should be able to bet the client settings working by adding the same line, default-character-set = utf8, to the [client] section. But in my initial trials, that didn’t seem to change anything. I also tried adding default-character-set = utf8 to both the [client] and [mysql] sections. But I still get:

    mysql> show variables like 'char%';
    +--------------------------+----------------------------+
    | variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | latin1                     |
    | character_set_connection | latin1                     |
    | character_set_database   | utf8                       |
    | character_set_filesystem | binary                     |
    | character_set_results    | latin1                     |
    | character_set_server     | utf8                       |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.01 sec)

    mysql> show variables like '%collation%';
    +----------------------+-------------------+
    | Variable_name        | Value             |
    +----------------------+-------------------+
    | collation_connection | latin1_swedish_ci |
    | collation_database   | utf8_general_ci   |
    | collation_server     | utf8_general_ci   |
    +----------------------+-------------------+
    3 rows in set (0.00 sec)

Some of the things I was reading indicated that my problem might be that connections for superusers (like root) might behave differently than connections for normal users. I tried creating a normal user and then retesting various combinations of parameters - with the same results.

I finally found something that consistently gave me utf8 connection parameters without having to set them expliclty (see digression below). In this article about converting to a better version of MySQL utf8, utf8mb4, I see this directive: character-set-client-handshake = FALSE. Adding that to my [mysqld] section gives me consistent utf8 connection parameters with root and other users. However, I don’t think I really want to do that. I may want to be able to override the character set information for some connections (for example if I am connecting to legacy databases that are not in utf8). I think most of the frameworks I use already pass a character set parameter with the connection - at least Rails has a character set option in it’s database.yml config file.

So I am still looking for the right answer.

StackEdit

Installing StackEdit Locally

I have been looking into different editing and publishing tools for a new project at work. One of the formats of interest in Markdown because it is fairly straightforward to do a number of basic things and the stored format is cleaner than html. But since most people, myself included, don’t have the various tags committed to memory, we are going to need to provide an in browser preview for the Markdown input. So I went looking for libraries that would allow us to add a preview pane to our textareas.

One of the tools I found is an in browser editor that uses Markdown as it’s fundamental input format and can save the markdown to a variety of collaboration and blogging tools or export the text as html or PDF. It will also save your file to a ssh server. I am not sure I want to put my username and password into a web site - any web site. So instead of using the web interface, I think I’ll try to install it on my laptop and use it from there.

So, following the developer instructions at https://github.com/benweet/stackedit/blob/master/doc/developer-guide.md, I did:

  1. Git clone https://github.com/benweet/stackedit.git into ~/Code/javascript

  2. StackEdit needs node and npm. I have those, but I bet my versions are out of date, so let’s update all my homebrew software: brew update brew upgrade

  3. Update my version of npm (gets released more frequently than node.js): npm install npm -g

  4. I need gulp and bower and probably want them global / in my path: npm install -g bower npm install -g gulp

  5. install local dependencies (in ~/Code/javascript/stackedit): npm install

  6. Now back on the developer instructions: bower install

  7. The instructions for running the node server were export PORT=80 && node server.js but that gave me errors because I was trying to bind to a privileged port. So instead I did export PORT=9999 && node server.js and I am now editing this on http://localhost:9999/editor

Unfortunately, I was not able to get the saving to an ssh server to work - at least not when the ssh server is my local Mac. I am pretty sure this is an issue with how my Mac’s ssh daemon is configured. But without that, I don’t know that editing in the browser with side by side preview is all that much more convenient than creating posts in emacs and using Octopress’s rake preview task. So installing StackEdit locally was a fun exercise. But I think I will mostly be using it on the web at http://stackedit.io/

Using IPython

Using IPython as your Python REPL

I had heard about IPython on my Twitter feed from people like Titus Brown but only in the context of IPython Notebooks for sharing data and calculations. I had been meaning to look into it because I figured at some point someone would want our group at work to do something about sharing how to do scientific analyses, etc. and that knowing IPython might come in handy. What I didn’t know is that you can use IPython - without the notebook component - as a replacement for the standard python REPL, kind of like how lots of people use pry instead of Ruby’s built in irb.

I wish I had looked into this sooner. IPython is more like the python REPL I needed when I was starting out in Python. I haven’t had a chance to use it a lot - for one thing I installed IPython in my virtual environment and my current environment doesn’t really have anything in it to play with. But already I am hopeful.

The main thing that makes running Python in a shell inside emacs so painful is that copying code and pasting into the emacs window makes the indenting go all weird. Can ipython fix that? I am not sure. But it does have a %paste macro. Does that help? Perhaps it helps for pasting with the paste buffer. But my usual emacs workflow of copying a chunk of code into my kill ring and then yanking it into the shell is still broken due to whitespace issues (unless of course the code is actually not indented, which rarely happens). Perhaps %run will help - at least to get some code into my REPL so I can play with it.

ipython has a nice readline history feature. One nice option is being able to get the output from the last 3 commands you ran with: _, __, and ___. I am also very impressed with being able to run system commands and assign the output to python variables, e.g. %cd /Users/me myfiles = %ls .bash*

Or something like:

    pattern = "imp"
    !grep -rF $pattern ipython/*

To get specific information on an object, you can use the magic commands %pdoc, %pdef, %psource and ```%pfile`` OMG! This is like pry’s cd into an object and look around. Instructions for using these magic commangs as part of some other python program are in the docs. This seems to me to be a lot like binding.pry to start a REPL somehwhere.

Since %automagic is turned ON by default, you can often just do ls or cd without the leading %. You can toggle the automagic by typing %automagic until you get what you want.

To see what things are included in automagic: %magic -brief

There is also a set of docs on how to use ipython as a systems shell: https://ipython.org/ipython-doc/dev/interactive/shell.html

Reloading files inside the REPL

The IPython.lib.deepreload module allows you to recursively reload a module: changes made to any of its dependencies will be reloaded without having to exit. To start using it, do:

    from IPython.lib.deepreload import reload as dreload

Misc

Automatic quoting

The automatic quoting sounded useful but when I played around with it, it was odd. All I really want is an equivalent of Ruby’s %w(foo bar baz) so I don’t have to type a boatload of “ and , to get ["foo", "bar", "baz"]. Unfortunately this is not that feature.

Spying on IPython

Or “How IPython keeps track of what you have running where”. When the ipython notebook started up, it created the nbserver-###.json file. When I started actually running code in an .ipynb file, then it needed a running kernel - so it spun one up and created the kernel=###.json file

    /Users/cnk/.ipython/profile_default/security:
      total used in directory 64 available 230555563
      drwx------   8 cnk  staff    272 May 14 18:26 .
      drwxr-xr-x  10 cnk  staff    340 May 14 18:26 ..
      -rw-rw-r--   1 cnk  staff    245 May 14 18:26 kernel-fc238266-ac92-423c-99d0-c6faeb911a15.json
      -rw-rw-r--   1 cnk  staff    187 May 14 18:26 nbserver-61240.json

This part of the docs has lots of useful information about how ipython processes communicate with the kernel - message passing, security (only bind to localhost OR use ssh tunnels to encrypt trafic and keep it private).

IPython

Installing IPython

For our new project we are trying to use Python 3, so I want to evaluate IPython on the latest version of Python, which currently is 3.4.3. I am using pyenv and pyvirtualenv to manage my python installs. So with those all set up, I installed python 3.4.3 and created a virtualenv to use for ipython.

    pyenv install 3.4.3
    pyenv virtualenv 3.4.3 ipython-3.4.3
    mkdir ~/Code/ipython
    cd ~/Code/ipython
    echo 'ipython-3.4.3' > .python-version
    pip install --upgrade ipython
    # the pip install above gave me messages about upgrading pip, so I did:
    pip install --upgrade pip

I haven’t used IPython before - or should I say Jupyter since that is the project’s new name. So I watched a couple of the introductory videos I found on YouTube. This video told me I needed to add the python 3 kernel to jupyter, so I tried to do that but kept getting messages about missing pieces every time I tried to run ipython.

    $ ipython3 kernelspec install-self
    # error message about not having pyzmq
    $ pip install pyzmq
    $ ipython3 kernelspec install-self

    $ pip freeze
    gnureadline==6.3.3
    ipython==3.1.0
    pyzmq==14.6.0

    $ ipython
    ImportError: No module named 'jinja2'
    $ pip install jinja2
    ImportError: No module named 'tornado'
    $ pip install tornado

    $ ipython
    IPython notebook format depends on the jsonschema package:
    $ pip install jsonschema

    $ pip freeze
    certifi==2015.4.28
    gnureadline==6.3.3
    ipython==3.1.0
    Jinja2==2.7.3
    jsonschema==2.4.0
    MarkupSafe==0.23
    pyzmq==14.6.0
    tornado==4.1

What I had overlooked was an install target that would install not only ipython but all the things it typically depends on / uses. Once I installed ‘all’ I got a usable ipython.

    $ pip install ipython[all]
    $ pip freeze
    alabaster==0.7.4
    Babel==1.3
    certifi==2015.4.28
    docutils==0.12
    gnureadline==6.3.3
    ipython==3.1.0
    Jinja2==2.7.3
    jsonschema==2.4.0
    MarkupSafe==0.23
    mistune==0.5.1
    nose==1.3.6
    numpydoc==0.5
    ptyprocess==0.4
    Pygments==2.0.2
    pytz==2015.2
    pyzmq==14.6.0
    requests==2.7.0
    six==1.9.0
    snowballstemmer==1.2.0
    Sphinx==1.3.1
    sphinx-rtd-theme==0.1.8
    terminado==0.5
    tornado==4.1

Exporting from IPython to other formats

One of the things we are most interested in for work is the export formats that IPython / Jupyter supports so I tried out all the export formats. The PDF renderer was looking for pandoc. According to the pandoc install instructions, before I can install pandoc, I need a version of LaTeX for the Mac. So, I installed MacTeX from their MacTeX.pkg. And then I installed pandoc with homebrew: brew install pandoc. I still can’t export PDFs from the drop down menu in the iPython web interface:

    nbconvert failed: command could not be found pdflatex

However, from the command line, I can convert my test document to latex:

    $ ipython nbconvert --to latex --template article CNKTest.ipynb

produces the file CNKTest.tex which I can open and view in Latexit (which came from MacTeX).

Matplotlib

Some of the things I am playing aroud with want matplotlib - which does not appear to be installed in my virtualenv. So I ran pip install matplotlib which says it did:

    Installing collected packages: python-dateutil, numpy, pyparsing, matplotlib
    Successfully installed matplotlib-1.4.3 numpy-1.9.2 pyparsing-2.0.3 python-dateutil-2.4.2

And now I can load matplotlib inside a running ipython session with %matplotlib (or I could load it when I start the session using the command line argument: ipython --matplotlib

Installing Discourse

Discourse installed with Vagrant and Docker

I guess I am old school but I prefer mailing lists to web forums - probably because I vastly prefer mutt to any forum web site I have seen. But if I have to use a discussion board, then by far the best one I have seen is Discourse. I first heard about it in a couple of episodes of the Ruby Rogues podcast.1 2 When the Rogues moved their mailing list from Google Groups to Discourse, I was a little annoyed because Discourse didn’t support ‘just email it all to me’ and I just never get around to checking web sites. Fortunately the good folks at Discourse added setting that allow folks like me to get all our Ruby Parley goodness by email. But for those who like web forums, Discourse has some really nice features - keeping track of what is new, likes, bookmarks, follow individual posts, private messages and a really nice reputation system.

At work we are looking for a discussion board for a project we are doing and I suggested Discourse. In addition to the hosted discussions, there is an open source version of the same code available on GitHub. The recommended way to install Discourse is to use the supplied Docker images.

Setting up my Vagrant

To get started I needed an Ubuntu 14 LTS disk image. The one distributed by Hashicorp labeled “lattice/ubuntu-trusty-64” is configured to use 4G memory and isn’t running chef or puppet (so nothing to interfer with the Docker stuff from Discourse).

    vagrant init lattice/ubuntu-trusty-64
    vagrant up
    # first thing it did was download the vbox image
    vagrant ssh
    wget -qO- https://get.docker.com/ | sh

    ... lots of output about updating packages and installing docker

    If you would like to use Docker as a non-root user, you should now consider
    adding your user to the "docker" group with something like:

      sudo usermod -aG docker vagrant

Installing Discourse

    mkdir /var/discourse
    git clone https://github.com/discourse/discourse_docker.git /var/discourse
    cd /var/discourse
    cp samples/standalone.yml containers/app.yml
    # edited email settings in app.yml per instructions

I misread the instructions and instead of typing ./launcher bootstrap app, I did ./launcher start app. That gave some reasonable looking output before it bailed out complaining:

    Unable to find image 'local_discourse/app:latest' locally
    Pulling repository local_discourse/app
    FATA[0002] Error: image local_discourse/app:latest not found

Doing ./launcher rebuild app did reasonable looking things like starting databases, checking out code from git, running database migrations, and compiling assets.

I tried modifying my /etc/hosts file to include discourse.example.com but no dice - I think because my Vagrant isn’t exposing port 80. So I shut down the vagrant box and edited the Vagrant file to uncomment

    config.vm.network "forwarded_port", guest: 80, host: 8080

Now http://discourse.example.com:8080 works. Since I can don’t really need this to answer to a specific name, I took the modification back out of /etc/hosts and will just use http://localhost:8080 to play with configuring and testing Discourse.