For a recent project I needed to calculate the pairwise distances of a set of observations to a set of cluster centers. In MATLAB you can use the pdist function for this. As far as I know, there is no equivalent in the R standard packages. So I looked into writing a fast implementation for R. Turns out that vectorizing makes it about 40x faster. Using Rcpp is another 5-6x faster, ending up with a 225x speed-up over the naive implementation.
One reason I like using R for data analysis is that R has a great collection of packages that let you easily apply state-of-the-art methods to your problems. But once in a while you find a library that you would like to use that does not have a R wrapper, yet. While the great Rcpp package provides a convenient way to write R extensions in C++, it obviously requires you to write C++ code and to have a compiler installed.
An alternative I found about only recently is the rdyncall package. rdyncall provides an improved Foreign Function Interface (FFI), which allows you to dynamically invoke C libraries.
Short update to Speed up R by using a different BLAS implementation/:
- MKL is overall the fastest
- OpenBLAS is faster than its parent GotoBLAS and comes close to MKL
A = matrix(rnorm(n*n),n,n) A %*% A solve(A) svd(A)
The default Windows command prompt (cmd.exe) is terrible compared to what UNIX terminal emulators and shells offer that’s why people constantly develop new tools to make the command prompt bearable. Here are some links:
- CMD replacement implemented in Python
- provides better TAB completion and command line editing
- not a full CMD replacement
- not under active development (?)
- customizable command completion (via LUA scripting)
- UNIX shell like key binding and command line editing
- thin documentation
- user community mostly Japanese
- formerly known as 4NT
- full replacement for CMD.exe
- backwards compatible; can execute standard batch files
- improved command line editing
- aka Cygwin or Windows ports of zsh, bash, etc.
- IMHO better than any of the Windows shells
- execution speed of scripts is slow due to “fork() emulation”
- adds ANSI colors to cmd.exe
- injects dll into cmd.exe
- adds command line editing (readline)
- adds bash-like command completion (customizable via LUA scripts)
Alternative terminal emulators
- tabbed interface
- before ConEmu the best alternative cmd interface
- based on the great PuTTY
- excellent for use with Cygwin or MSYS shells
- my new favorite (replacing Console 2)
- full color support
- horizontal and vertical splits
- lots of options
ConEmu is definitely a must have. Adding clink to your system does not hurt and makes cmd.exe somewhat usable. If you want to try another shell I would give NYAOS a try. Unfortunately the documentation is thin and the community is mostly Japanese.
It is no news that R’s default BLAS is much slower that other available BLAS implementations. In A trick to speed up R matrix calculation/ Yu-Sung Su recommends using the ATLAS BLAS which is available on CRAN. When I learned about the possible speed-up a while ago I tried several BLAS libraries and I found that GotoBLAS2 was giving me the best performance among the open-source BLAS implementations. Today I decided to check once again how much it makes sense to replace R’s default BLAS library.
Here are some results from my Intel i7-620M laptop running Windows 7:
I recently looked at the Sublime Text and saw a feature that I liked but have not seen yet in my favorite editor vim: editing multiple selections at once.
Vim is very powerful and has great multi-line editing features, but as far as I know there is nothing that allows users to edit multiple selections with instant feedback.
So I took up the challenge to write my first vim plugin. Have a look at the first version at work:
I started using knitr with reStructuredText today and I found that the syntax highlighting with pygments (used by
rst2html.py) was not as nice as the output of pandoc. So I ended up doing some monkeypatching.
Try adding the following to
# SLexer is the lexer used for R from pygments.lexers.math import SLexer from pygments.token import Keyword, Name # monkey patching SLexer ... # add some builtin functions (TODO: add more) SLexer.tokens['keywords'].append( (r'(?<![A-Za-z0-9_-])(c|library)(?=\()', Name.Builtin)) # treat all names in front of a parenthesis as function names SLexer.tokens['keywords'].append( (r'[a-zA-Z][a-zA-Z_0-9]+(?=\s*\()', Name.Function)) # parameter names inside function calls/definitions SLexer.tokens['root'].insert(0, (r'(?<=[\(,])\s*[a-z]+\s*(?==)', Name.Attribute))
Note: I assume you already added pygments’ rst-directive.py to
I sometimes run into the problem that I work on a computer (via ssh) which does not have all the tools and libraries installed that I want to use. In the past I went on and compiled all I needed manually and installed them into
Problem: you don’t have any kind of package management for the stuff you installed into
Solution: GNU stow
GNU Stow is a symlink farm manager which takes distinct packages of software and/or data located in separate directories on the filesystem, and makes them appear to be installed in the same place.
With stow you install each piece of software into a different directory and you use stow to create symlinks.