##! SRILM -- install on Visual Studio 2010 - srilm v1.6.0
Add files BlockMalloc.h/c to sri_dstruct, MStringTokUtil.h/c to sri_misc, NonZero.h/c to ngram/lattice-tool
http://www.keithv.com/software/srilm/

Add LM_SRI directive, SRI/PhraseBackoff/MultiFactor.cpp/h to moses project
D:\Documents\Academics\RA\srilm\lattice\src;D:\Documents\Academics\RA\srilm\dstruct\src;D:\Documents\Academics\RA\srilm\misc\src;D:\Documents\Academics\RA\srilm\flm\src;D:\Documents\Academics\RA\srilm\lm\src;

Add  to 
##! Perl -- count regex matches

Here's one way to do it:
$count += () = $teststring =~ /(test)/g;

This syntax may be a little crypric at first, so here's the breakdown:
You can assign the results of a match to an array, like so: @matches = $teststring =~ /(test)/g;
Note the use of a capturing match (we use (test) in parentheses), and the /g modifier to match more than once.
Now if you have @matches, then using it in scalar context gives you how many matches there were, so this would work:

$count += @matches;
The method I'm suggesting simply does this without the temporay variable. You do need to force the match to take place in list context, otherwise it'll just return 1 on successful matches; that's what the () = bit achieves.

##! Moses - TypeDef.h
Definitions of different types, e.g.,
enum LMImplementation {
  SRI			= 0
  ,IRST		= 1
//  ,Skip		= 2
  ,Joint		= 3
//  ,Internal	= 4
  ,RandLM 	= 5
  ,Remote 	= 6
  ,ParallelBackoff	= 7
  ,Ken			= 8
  ,LazyKen	= 9
  ,ORLM = 10
  ,DMapLM = 11
};

##! UNIX - compile
./configure --prefix=$PREFIX --libdir=$PREFIX/lib64

##! Visual Studio - debug symbol
'moses-cmd.exe': Loaded 'D:\Documents\Academics\RA\moses\Debug\moses-cmd.exe', Symbols loaded.
'moses-cmd.exe': Loaded 'C:\Windows\System32\ntdll.dll', Cannot find or open the PDB file
'moses-cmd.exe': Loaded 'C:\Windows\System32\kernel32.dll', Cannot find or open the PDB file
'moses-cmd.exe': Loaded 'C:\Windows\System32\KernelBase.dll', Cannot find or open the PDB file
'moses-cmd.exe': Loaded 'C:\Windows\System32\msvcp100d.dll', Symbols loaded.
'moses-cmd.exe': Loaded 'C:\Windows\System32\msvcr100d.dll', Symbols loaded.
'moses-cmd.exe': Loaded 'C:\Program Files\MATLAB\R2009a\bin\win32\zlib1.dll', Binary was not built with debug information.
'moses-cmd.exe': Loaded 'C:\Windows\System32\msvcrt.dll', Cannot find or open the PDB file

Goto Debug/Options and Settings: in Debugging/Symbols, choose Microsoft Symbol Servers

##! Moses - install in Visual Studio 2010
* move moses.sln from contrib/other-builds into the root dir, open it, move the missing vcxproj in contrib/other-builds into the approriate folder
* create include, lib dir that contain vlib, and vld + boost directory
http://www.statmt.org/moses/?n=Moses.LibrariesUsed
Assume they are C:\include, C:\lib, C:\Program Files\boost\boost_1_47

* compile consolidate, consolidate-direct, extract, extract-rules, score
** Add Properties\C/C++\General, Additional Include Directories: C:\include
** Add Properties\Linker\General, Additional Library Directories: C:\lib

* compile OnDiskPt: move vcxproj to OnDiskPt/, edit the file to include source from ./ but not src/
** Add Properties\C/C++\General, Additional Include Directories: C:\include, $(SolutionDir);

* compile moses
** Add Properties\C/C++\General, Additional Include Directories: C:\include; C:\Program Files\boost\boost_1_47
** Add LM/Ken.*, PhraseDictionaryALSuffixArray*, PhraseDictionaryHiero*, RuleTableLoaderHiero* to project

* compile kenlm
** make a symlink kenlm -> lm
** download from https://github.com/kpu/kenlm, and copy contrib/other-builds/kenlm/kenlm.vcxproj into the root dir
** edit kenlm.vcxproj to remove files portability.hh/cc

* compile moses-chart-cmd, moses-cmd, processPhraseTable
** Add Properties\Linker\General, Additional Library Directories: C:\lib

mv *.cpp *.h of OnDiskPt into src/

install include files NJAMD (Not Just Another Malloc Debugger)

##! Moses - install decoder Linux
* install boost
./bootstrap.sh --prefix=/home/lmthang/software
./b2 --prefix=$PREFIX --libdir=$PREFIX/lib64 --layout=tagged link=static,shared threading=multi install
* install giza-pp
  wget https://giza-pp.googlecode.com/files/giza-pp.tgz
  tar xzf giza-pp.tgz
  cd giza-pp
  make
  cp GIZA++-v2/{GIZA++,plain2snt.out,snt2cooc.out,snt2plain.out,trainGIZA++.sh} mkcls-v2/mkcls .
  
* install moses
./bjam -j8 --with-srilm=/home/lmthang/scr/smt/srilm --with-giza=/home/lmthang/scr/smt/giza-pp --with-boost=/home/lmthang/software --install-scripts=/home/lmthang/scr/smt/scripts

##! Moses - install srilm
* find out machine type, e.g. i686
./sbin/machine-type
* if i686, edit the srilm/sbin/machine-type script and change
else if (`uname -m` == x86_64) then
#set MACHINE_TYPE = i686-m64
set MACHINE_TYPE = i686

to

else if (`uname -m` == x86_64) then
set MACHINE_TYPE = i686-m64
#set MACHINE_TYPE = i686

* compile
make MACHINE_TYPE=i686-m64 World

(Note that if the machine type is i686 only, the compiler will expect gnu/stubs-32.h, while your system might only have /usr/include/gnu/stubs-64.h)
* if error "/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory"
install glibc-devel (http://www.cyberciti.biz/faq/x86_64-linux-error-gnustub-32h-missing-error-and-solution/)
(if necessary edit common/Makefile.machine.i686, to add ADDITIONAL_INCLUDES = -I/home/lmthang/software/include)

##! Scheme - hash table keys
(display (hash-table-keys preterminals))
http://srfi.schemers.org/srfi-69/srfi-69.html

##! Scheme - handle command-line
* procedure command-line
http://www.gnu.org/software/guile/manual/html_node/Runtime-Environment.html

* srfi-1
http://srfi.schemers.org/srfi-1/srfi-1.html

* example: check if there is a 6th argument, if true, set is-debug flag
(display "# Args: ")(disp (command-line))
(when (> (length (command-line)) 5)
  ;; get debug flag
  (let ([debugflag (sixth (command-line))])
    (when (string=? debugflag "true")
      (set! is-debug #t))))

##! Python - string into char array
list(s)

##! Unix - install glibc
git clone git://sourceware.org/git/glibc.git
cd glibc
git checkout --track -b glibc-2_10-branch origin/release/2.10/master

* install glibc-devel
"configure: error: you must configure in a separate build directory"
call configure script outside the glibc dir: 
mkdir thang-build
cd thang-build
../configure --prefix=/home/lmthang/software

when make install, if it doesn't find ld.so.conf, copy from /etc/ld.so.conf into the local dir/etc

* to compile 32-bit
../configure --prefix=/home/lmthang/software --host=i686-linux-gnu --build=i686-linux-gnu CC="gcc -m32" CXX="g++ -m32"  CFLAGS="-O2 -march=i686" CXXFLAGS="-O2 -march=i686"
 
http://stackoverflow.com/questions/8004241/how-to-compile-glibc-32bit-on-an-x86-64-machine
--host=x86_64-pc-linux-gnu --build=i386-pc-linux-gnu

##! Unix -- wget download with wildcard
wget -r -l1 --no-parent -A "*bg-mk*" http://opus.lingfil.uu.se/experiments/data/mk-en
 
##! Python -- check dir and create
 def check_dir(dir):
   if os.path.isdir(dir) == False:
     sys.stderr.write('Directory %s does not exist, creating ... \n    ')
     os.mkdir(dir)

 
##! Matlab -- cell array to string
vocabulary cell array
sentenceStr = strcat(vocabulary{:});
fprintf('# Parse %s\n', sentenceStr);

##! Matlab -- getting values of multiple keys from container
a = containers.Map({'a', 'b'}, num2cell(1:1:2));
keys = {'a' 'b' 'c'};
indices = find(isKey(a, keys) == 1);
b = cell2mat(values(a, keys(indices)))
results = zeros(1, length(keys));
results(indices) = b;


##! Matlab -- convert to cell
cellstr: for string array
num2cell: for num array

##! Matlab -- check file exists
	if exist(morphemeFile, 'file') ~= 2
		continue;
	end
##! Matlab -- concatenate maps
a = containers.Map({'a', 'b'}, num2cell(1:1:2));
a.keys
a.values
b = containers.Map({'c', 'd'}, num2cell(3:1:4));
b.keys
b.values
c = [a; b];
c.keys
c.values

##! Matlab -- container map
numRefSegments = 3;
refValues = [4 5 6];
newMap = containers.Map(num2cell(refValues), num2cell(1:1:numRefSegments))
	
##! Matlab -- split string
regexp(str, '\t', 'split');

##! Matlab -- check directory exists
% check and create dir
if exist(outDir, 'dir') ~= 7 % directory does not exist
	fprintf(1, '# Creating out dir %s ... \n', outDir);
	mkdir(outDir);
end

##! Matlab -- print cell
sprintf('%s, ', A{:})
fprintf(1, '# Good word=%s: %s\n', params.characterArray(tNum), sprintf('\"%s\" ', treeGood.nodeNames{:}));
	
##! Python - Windows setup
* download from http://www.python.org/download/releases/, and install
* set system environment:
PYTHONPATH: C:\Python32\Lib
PATH: C:\Python32

##! R - check if an attribute exists
http://stackoverflow.com/questions/1177919/does-column-exist-and-how-to-rearrange-columns-in-r-data-frame
"DLTIC" %in% colnames(dundee)

##! bash - append array
http://tldp.org/LDP/abs/html/arrays.html
dest=( ${array1[@]} ${array2[@]} )

##! bash - string tokenization
http://stackoverflow.com/questions/5382712/bash-how-to-tokenize-a-string-variable

$ string="john is 17 years old"
$ for word in $string; do echo "$word"; done
john
is
17
years
old

##! bash - condition if
http://www.linuxtutorialblog.com/post/tutorial-conditions-in-bash-scripting-if-statements

if [ $foo -ge 3 -a $foo -lt 10 ]; then
-a: and 
-o: or

##! Latex - convert powerpoint to eps
http://people.csail.mit.edu/dalleyg/faqs/20010425.html
http://www.iml.ece.mcgill.ca/~stephan/ooeps

* Create new slide, in Design/Layout, set size equals to size of the image, print pdf
** pdftops -eps Presentation1.pdf
** fix bounding box
 cat Presentation1.eps | ~/software/ps2eps/bin/ps2eps > Presentation2.eps

##! Bash --- for through array
a=("c" "b" "a")
size=${#a[@]}
echo "size=$size"
for i in 0 1 2; do
  echo "Thang ${a[$i]}"
done

for i in $(seq 0 $(($size-1))); do
  echo "Thang1 ${a[$i]}"
done


##! Eclipse --- Install Profiling tootl
http://www.eclipse.org/tptp/home/downloads/4.7.0/documents/installguide/InstallGuide.html

* Add Helios repository
http://download.eclipse.org/releases/helios/

* Expand the Web, XML, and Java EE Development entry in the Helios update site 
* Select the Eclipse Web Developer Tools option.
* Complete Installation. Restart Eclipse when prompted.

##! Python --- LISP style functions
8.12 List Comprehensions - "Core Python Programming 2nd Edition"

* map
map(lambda x: x ** 2, range(6))
[0, 1, 4, 9, 16, 25]
or 
[x ** 2 for x in range(6)]
[0, 1, 4, 9, 16, 25]

[expr for iter_var in iterable if cond_expr]

* filter
>>> seq = [11, 10, 9, 9, 10, 10, 9, 8, 23, 9, 7, 18, 12, 11, 12]
>>> filter(lambda x: x % 2, seq)
[11, 9, 9, 9, 23, 9, 7, 11]
or	
>>> [x for x in seq if x % 2]
[11, 9, 9, 9, 23, 9, 7, 11]

* matrix
[(x+1,y+1) for x in range(3) for y in range(5)]

* count the number of words
len([word for line in f for word in line.split()])
* count the number of characters
sum([len(word) for line in f for word in line.split()])

##! Python --- sort list
>>> a = {'a':4, 'd':1, 'c':2, 'b':5}
>>> sorted(a.items())
[('a', 4), ('b', 5), ('c', 2), ('d', 1)]
>>> sorted(a.items(), key=lambda x: x[0])
[('a', 4), ('b', 5), ('c', 2), ('d', 1)]
>>> sorted(a.items(), key=lambda x: x[1])
[('d', 1), ('c', 2), ('a', 4), ('b', 5)]
>>> sorted(a.items(), key=lambda x: x[1], reverse=True)
[('b', 5), ('a', 4), ('c', 2), ('d', 1)]

##! R --- workspace variables
http://ww2.coastal.edu/kingw/statistics/R-tutorials/savingloading.html

> ls()                                 # This is my default workspace from My Documents.
[1] "age.out"       "m.ex"          "m.ex.minussex" "outcome.out"  
[5] "respiratory"   "seizure"       "visit.matrix"
> setwd("Rspace")                      # First change, if you haven't already.
> rm(list=ls())                        # Delete the default workspace.
> load(".RData")                       # Load a previously saved workspace.
> loadhistory()                        # Load a previously saved history file.
> ls()
[1] "my.data"

##! Python - regular expression sub
pattern = re.compile(''.join(['([', string.punctuation, ']+)']))
 ref_line = pattern.sub(r' \1', ref_line)
 
##! Java - set path
export JAVA_HOME=/u/nlp/packages/java/jdk1.6.0_27-x86_64
export PATH=${PATH}:${SOFTWARE}/bin:/u/nlp/packages/apache-ant/bin:${JAVA_HOME}/bin

Important avoid:
export JAVA_HOME=${JAVA_HOME}:/u/nlp/packages/java/jdk1.6.0_27-x86_64

##! R --- install non-root

R CMD INSTALL lme4_0.999375-42.tar.gz -l /home/lmthang/software/R

add R_LIBS environment
R_LIBS=/home/USER/R
export R_LIBS

##! Unix -- find unique
if you want the ones that are in file1, but not file2, you can do this:
cat file1 file2 file2 | sort | uniq -c| egrep "^ +1"|wc


##! R --- R scripting
using here document (suggested by Fran¸cois Pinard)

#!/bin/sh
# [environment variables can be set here]
inFile=$1
outFile=$2

R --slave --args $inFile $outFile <<EOF
args <- commandArgs(TRUE)
args
EOF

---
alternatively:
R CMD BATCH --args arg1 arg2 foo.R
Rscript foo.R arg1 arg2

##! Latex --- reduce space
\quad or \qquad for spacing in displayed math material

##! Latex --- use align rather than eqnarray
\begin{align*}
S &= \sum_{w_k:c(w_1^{k})>0} P_{\mbox{\small katz}}(w_k|w_1^{k-1}) + \sum_{w_k:c(w_1^{k})=0} P_{\mbox{\small katz}}(w_k|w_1^{k-1}) \\
&= \sum_{w_k:c(w_1^{k})>0} P^*(w_k|w_1^{k-1}) + \alpha*\left[\sum_{w_k:c(w_1^{k})=0} P_{\mbox{\small katz}}(w_k|w_2^{k-1})\right] \\
&= \sum_{w_k:c(w_1^{k})>0} P^*(w_k|w_1^{k-1}) + \alpha*\left[1-\sum_{w_k:c(w_1^{k})>0} P^*(w_k|w_2^{k-1})\right] \\
\end{align*}

##! Latex --- multiline equations
http://www.math.uiuc.edu/~hildebr/tex/displays.html

\begin{equation}
\label{e:katz_general}
 P_{\mbox{katz}}(w_k|w_1^{k-1}) =
  \begin{cases}
   P^*(w_k | w_1^{k-1}) & \text{if } C(w_1^{k})>0 \\
   \alpha(w_1^{k-1}) P_{\mbox{\small katz}}(w_k|w_2^{k-1}) & \text{else if } C(w_1^{k-1})>0 \\
   P_{\mbox{\small katz}}(w_k|w_2^{k-1})       & \text{otherwise }
  \end{cases}
\end{equation}

"align" environment: put "\notag" right before the linebreak symbol ("\\") on all other lines. On lines that are to be numbered, you can put the label command, "\label{...}", before the linebreak. 

With "align*", it works the other way: By default, none of the lines gets numbered, so to number a particular line you must put an explicit "\tag{...}" command at the end of that line, before the linebreak symbol.


##! Vim --- search and replace
http://vim.wikia.com/wiki/Search_and_replace
:%s/foo/bar/g
Find each occurrence of 'foo', and replace it with 'bar'.
:%s/foo/bar/gc
Change each 'foo' to 'bar', but ask for confirmation first.
:%s/\<foo\>/bar/gc
Change only whole words exactly matching 'foo' to 'bar'; ask for confirmation.
:%s/foo/bar/gci
Change each 'foo' (case insensitive) to 'bar'; ask for confirmation.
:%s/foo/bar/gcI
Change each 'foo' (case sensitive) to 'bar'; ask for confirmation.
The g flag means global – each occurrence in the line is changed, rather than just the first.

##! Latex --- include image
\includegraphics[width=1\textwidth, clip=true, trim = 50 0 50 0]{simpleGT.\imgExt}

##! Vim --- indent code block
http://vim.wikia.com/wiki/Indent_a_code_block

=i{ reindents "inner block" (inside the braces).
=a{ reindents "a block" (including the braces).
=2a{ reindents 2 blocks (this block and containing block).

##! R ---  c function (mnemonic for combine)
generate vector x = c(12, 19, 22, 15, 12)

##! R --- gl generate factor levels
http://www.math.montana.edu/Rweb/Rhelp/gl.html

gl(n, k, length = n*k, labels=1:n, ordered=FALSE)
The result has levels from 1 to n with each value replicated in groups of length k

# First control, then treatment:
gl(2,8, label=c("Ctnrl","Treat"))
# 20 alternating 1s and 2s
gl(2, 1, 20)
# alternating pairs of 1s and 2s
gl(2, 2, 20)


##! Unix --- tcsh command history
http://tomecat.com/jeffy/tttt/cshhistory.html
https://itservices.stanford.edu/service/unixcomputing/unix/unixcomm

history	List all commands typed so far (default maximum number=20)
!!	Repeat the last command
!n	Repeat command n from the history list
!PATTERN	Repeat last command beginning with PATTERN
^PATTERN1^PATTERN2	Repeat last command but replace PATTERN1 (usually a typo) with PATTERN2 (the correction)

##! Unix --- .tcshrc shell environment file
http://members.tripod.com/~Rumman_Gaffur/unix/tcshrc.html	
http://www.sumedh.info/downloads/cshrc-tcshrc-login-alias-file.php

# Get the aliases and functions
alias ls 'ls --color=yes'
set prompt="`whoami`@%S%m%s %~ > "

# User specific environment and startup programs
setenv CS224N /afs/ir/users/l/m/lmthang/cs224n
setenv PA1 ${CS224N}/pa1/java
setenv CLASSPATH ${PA1}/classes

http://howto.unixdev.net/tcshrc.html
set autolist set color set colorcat

##! Unix --- check current shell
echo $0

##! Unix --- SSH using public key
ssh-keygen -t dsa
ssh-copy-id -i .ssh/id_dsa.pub user@hostname

##! Vim --- shift block of code
http://vim.wikia.com/wiki/VimTip224

visual mode, select text, then either "<<" or ">>"

##! Unix --- Merge PDF file
gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=out.pdf *.pdf

##! Vim --- keep indentation while pasting code
http://vim.wikia.com/wiki/Toggle_auto-indenting_for_code_paste

* In .vimrc
set pastetoggle=<F11>

* insert mode -> F11 -> paste code -> F11 (turn off paste mode)

##! Vim --- vim settings (tab, space)
http://www.slackorama.com/projects/vim/vimrc.html
http://www.jwz.org/doc/tabs-vs-spaces.html

* In .vimrc
set softtabstop=4 " interpret tab as an ``indent'' command instead     of an insert-a-tab command
set shiftwidth=4 " number of spaces to use for auto indent
set tabstop=4 " Tab charactered file to be displayed as spaces
set expandtab " enter spaces when tab is pressed

##! Java --- jar file and compilation
* check jar content
jar tf retroweaver-1.1.jar | grep runtime

* extract jar content
jar xf stanford-corenlp-2011-06-19-sources.jar

* Compile java src
javac -cp "/home/lmthang/RA/java/*" -sourcepath "src" -d prefixparser\
 src/edu/stanford/nlp/*/*.*
 
* build.xml
** Sample build.xml http://ant.apache.org/manual/using.html
** Add classpath (http://www.adp-gmbh.ch/java/ant/build_xml.html)
  <path id="compile.classpath">
	<pathelement location="classes"/>
    <fileset dir="/home/lmthang/RA/java">
      <include name="*.jar"/>
    </fileset>
  </path>
** Copy application resources
    <!-- Copy application resources -->
    <copy todir="${build}/edu/stanford/nlp/parser/lexparser">
        <fileset dir="classes/edu/stanford/nlp/parser/lexparser" excludes="**/*.java"/>
    </copy>

! Aug 2011
##! Python --- Install modules as non-root users
http://packages.python.org/distribute/easy_install.html
http://docs.python.org/install/index.html

* install Virtual python (http://pypi.python.org/pypi/virtualenv)
wget  http://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.6.4.tar.gz#md5=1072b66d53c24e019a8f1304ac9d9fc5

** In virtualen-1.6.4 directory:
python virtualenv.py /home/lmthang/software/python	

* export path in .bash_profile
export PYTHONDIR=/home/lmthang/software/python
export PATH=${PATH}:${PYTHONDIR}/bin

* set python alias in .bashrc
alias python='$PYTHONDIR/bin/python'

* exit, login again, run easy_install
easy_install argparse


##! Emacs --- Turn off backup files
http://www.rpi.edu/dept/acs/rpinfo/common/Computing/Consulting/Software/Emacs/Hints/backup.html

In .emacs, 
(setq make-backup-files nil) 

##! Vim --- Timestamp
http://vim.wikia.com/wiki/VimTip97
http://stackoverflow.com/questions/56052/best-way-to-insert-timestamp-in-vim

:r! date
produces:
Thu Sep 11 10:47:30 CEST 2008

:r! date "+%Y-%m-%d %H:%M:%S"
produces:
2008-09-11 10:50:56

##! Ubuntu --- Fixing locale Ubuntu perl
http://blog.thinkside.co.uk/?p=231

sudo /etc/default/locale, add
LANGUAGE="en_GB.UTF-8"
LC_ALL="en_GB.UTF-8"

sudo dpkg-reconfigure locales