26
May 13

Pairwise distances in R

For a recent project I needed to calculate the pairwise distances of a set of observations to a set of cluster centers. In MATLAB you can use the pdist function for this. As far as I know, there is no equivalent in the R standard packages. So I looked into writing a fast implementation for R. Turns out that vectorizing makes it about 40x faster. Using Rcpp is another 5-6x faster, ending up with a 225x speed-up over the naive implementation.

Continue reading →


12
May 13

Using C libraries in R with rdyncall

One reason I like using R for data analysis is that R has a great collection of packages that let you easily apply state-of-the-art methods to your problems. But once in a while you find a library that you would like to use that does not have a R wrapper, yet. While the great Rcpp package provides a convenient way to write R extensions in C++, it obviously requires you to write C++ code and to have a compiler installed.
An alternative I found about only recently is the rdyncall package. rdyncall provides an improved Foreign Function Interface (FFI), which allows you to dynamically invoke C libraries.

In this blog post I want to give you an example on how to employ rdyncall to use
the LWPR libary for Locally Weighted Projection Regression.
Continue reading →


06
Nov 12

R BLAS: GotoBLAS2 vs OpenBLAS vs MKL

Short update to Speed up R by using a different BLAS implementation/:

  • MKL is overall the fastest
  • OpenBLAS is faster than its parent GotoBLAS and comes close to MKL

A = matrix(rnorm(n*n),n,n) 
A %*% A
solve(A)
svd(A)

01
Nov 12

Ways to a better shell in windows

The default Windows command prompt (cmd.exe) is terrible compared to what UNIX terminal emulators and shells offer that’s why people constantly develop new tools to make the command prompt bearable. Here are some links:

cmd.exe replacements

PyCMD

  • http://sourceforge.net/projects/pycmd/
  • CMD replacement implemented in Python
  • provides better TAB completion and command line editing
  • not a full CMD replacement
  • not under active development (?)

NYAOS

  • http://www.nyaos.org/
  • customizable command completion (via LUA scripting)
  • UNIX shell like key binding and command line editing
  • thin documentation
  • user community mostly Japanese

TCC/LE

UNIX shells

  • aka Cygwin or Windows ports of zsh, bash, etc.
  • IMHO better than any of the Windows shells
  • execution speed of scripts is slow due to “fork() emulation”

cmd.exe improvements

ANSICON

  • https://github.com/adoxa/ansicon/
  • adds ANSI colors to cmd.exe
  • injects dll into cmd.exe
clink

Alternative terminal emulators

Console 2

mintty 

ConEmu-Maximus5

Conclusion

ConEmu is definitely a must have. Adding clink to your system does not hurt and makes cmd.exe somewhat usable. If you want to try another shell  I would give NYAOS a try. Unfortunately the documentation is thin and the community is mostly Japanese.


30
Oct 12

Speed up R by using a different BLAS implementation

It is no news that R’s default BLAS is much slower that other available BLAS implementations. In A trick to speed up R matrix calculation/ Yu-Sung Su recommends using the ATLAS BLAS which is available on CRAN. When I learned about the possible speed-up  a while ago I tried several BLAS libraries and I found that GotoBLAS2 was giving me the best performance among the open-source BLAS implementations. Today I decided to check once again how much it makes sense to replace R’s default BLAS library.

Here are some results from my Intel i7-620M laptop running Windows 7:

Speed up using MKL or GotoBLAS2 vs. R’s default BLAS

Continue reading →


30
Jun 12

Multi-editing in VIM

I recently looked at the Sublime Text and saw a feature that I liked but have not seen yet in my favorite editor vim: editing multiple selections at once.

Vim is very powerful and has great multi-line editing features, but as far as I know there is nothing that allows users to edit multiple selections with instant feedback.

So I took up the challenge to write my first vim plugin. Have a look at the first version at work:


20
May 12

Better R support in pygments by monkey patching SLexer

I started using knitr with reStructuredText today and I found that the syntax highlighting with pygments (used by rst2html.py) was not as nice as the output of pandoc. So I ended up doing some monkeypatching.

Try adding the following to rst2html.py:

# SLexer is the lexer used for R
from pygments.lexers.math import SLexer
from pygments.token import Keyword, Name

# monkey patching SLexer ...

# add some builtin functions (TODO: add more)
SLexer.tokens['keywords'].append(
   (r'(?<![A-Za-z0-9_-])(c|library)(?=\()', Name.Builtin))

# treat all names in front of a parenthesis as function names
SLexer.tokens['keywords'].append(
   (r'[a-zA-Z][a-zA-Z_0-9]+(?=\s*\()', Name.Function))

# parameter names inside function calls/definitions
SLexer.tokens['root'].insert(0,
   (r'(?<=[\(,])\s*[a-z]+\s*(?==)', Name.Attribute)) 

Before:

After:

Note: I assume you already added pygments’ rst-directive.py to rst2html.py.


18
May 12

Package management for your home directory with stow

I sometimes run into the problem that I work on a computer (via ssh) which does not have all the tools and libraries installed that I want to use. In the past I went on and compiled all I needed manually and installed them into ~/opt.

Problem: you don’t have any kind of package management for the stuff you installed into ~/opt.

Solution: GNU stow

GNU Stow is a symlink farm manager which takes distinct packages of software and/or data located in separate directories on the filesystem, and makes them appear to be installed in the same place.

With stow you install each piece of software into a different directory and you use stow to create symlinks.

Continue reading →