##! SRILM -- install on Visual Studio 2010 - srilm v1.6.0 Add files BlockMalloc.h/c to sri_dstruct, MStringTokUtil.h/c to sri_misc, NonZero.h/c to ngram/lattice-tool http://www.keithv.com/software/srilm/ Add LM_SRI directive, SRI/PhraseBackoff/MultiFactor.cpp/h to moses project D:\Documents\Academics\RA\srilm\lattice\src;D:\Documents\Academics\RA\srilm\dstruct\src;D:\Documents\Academics\RA\srilm\misc\src;D:\Documents\Academics\RA\srilm\flm\src;D:\Documents\Academics\RA\srilm\lm\src; Add to ##! Perl -- count regex matches Here's one way to do it: $count += () = $teststring =~ /(test)/g; This syntax may be a little crypric at first, so here's the breakdown: You can assign the results of a match to an array, like so: @matches = $teststring =~ /(test)/g; Note the use of a capturing match (we use (test) in parentheses), and the /g modifier to match more than once. Now if you have @matches, then using it in scalar context gives you how many matches there were, so this would work: $count += @matches; The method I'm suggesting simply does this without the temporay variable. You do need to force the match to take place in list context, otherwise it'll just return 1 on successful matches; that's what the () = bit achieves. ##! Moses - TypeDef.h Definitions of different types, e.g., enum LMImplementation { SRI = 0 ,IRST = 1 // ,Skip = 2 ,Joint = 3 // ,Internal = 4 ,RandLM = 5 ,Remote = 6 ,ParallelBackoff = 7 ,Ken = 8 ,LazyKen = 9 ,ORLM = 10 ,DMapLM = 11 }; ##! UNIX - compile ./configure --prefix=$PREFIX --libdir=$PREFIX/lib64 ##! Visual Studio - debug symbol 'moses-cmd.exe': Loaded 'D:\Documents\Academics\RA\moses\Debug\moses-cmd.exe', Symbols loaded. 'moses-cmd.exe': Loaded 'C:\Windows\System32\ntdll.dll', Cannot find or open the PDB file 'moses-cmd.exe': Loaded 'C:\Windows\System32\kernel32.dll', Cannot find or open the PDB file 'moses-cmd.exe': Loaded 'C:\Windows\System32\KernelBase.dll', Cannot find or open the PDB file 'moses-cmd.exe': Loaded 'C:\Windows\System32\msvcp100d.dll', Symbols loaded. 'moses-cmd.exe': Loaded 'C:\Windows\System32\msvcr100d.dll', Symbols loaded. 'moses-cmd.exe': Loaded 'C:\Program Files\MATLAB\R2009a\bin\win32\zlib1.dll', Binary was not built with debug information. 'moses-cmd.exe': Loaded 'C:\Windows\System32\msvcrt.dll', Cannot find or open the PDB file Goto Debug/Options and Settings: in Debugging/Symbols, choose Microsoft Symbol Servers ##! Moses - install in Visual Studio 2010 * move moses.sln from contrib/other-builds into the root dir, open it, move the missing vcxproj in contrib/other-builds into the approriate folder * create include, lib dir that contain vlib, and vld + boost directory http://www.statmt.org/moses/?n=Moses.LibrariesUsed Assume they are C:\include, C:\lib, C:\Program Files\boost\boost_1_47 * compile consolidate, consolidate-direct, extract, extract-rules, score ** Add Properties\C/C++\General, Additional Include Directories: C:\include ** Add Properties\Linker\General, Additional Library Directories: C:\lib * compile OnDiskPt: move vcxproj to OnDiskPt/, edit the file to include source from ./ but not src/ ** Add Properties\C/C++\General, Additional Include Directories: C:\include, $(SolutionDir); * compile moses ** Add Properties\C/C++\General, Additional Include Directories: C:\include; C:\Program Files\boost\boost_1_47 ** Add LM/Ken.*, PhraseDictionaryALSuffixArray*, PhraseDictionaryHiero*, RuleTableLoaderHiero* to project * compile kenlm ** make a symlink kenlm -> lm ** download from https://github.com/kpu/kenlm, and copy contrib/other-builds/kenlm/kenlm.vcxproj into the root dir ** edit kenlm.vcxproj to remove files portability.hh/cc * compile moses-chart-cmd, moses-cmd, processPhraseTable ** Add Properties\Linker\General, Additional Library Directories: C:\lib mv *.cpp *.h of OnDiskPt into src/ install include files NJAMD (Not Just Another Malloc Debugger) ##! Moses - install decoder Linux * install boost ./bootstrap.sh --prefix=/home/lmthang/software ./b2 --prefix=$PREFIX --libdir=$PREFIX/lib64 --layout=tagged link=static,shared threading=multi install * install giza-pp wget https://giza-pp.googlecode.com/files/giza-pp.tgz tar xzf giza-pp.tgz cd giza-pp make cp GIZA++-v2/{GIZA++,plain2snt.out,snt2cooc.out,snt2plain.out,trainGIZA++.sh} mkcls-v2/mkcls . * install moses ./bjam -j8 --with-srilm=/home/lmthang/scr/smt/srilm --with-giza=/home/lmthang/scr/smt/giza-pp --with-boost=/home/lmthang/software --install-scripts=/home/lmthang/scr/smt/scripts ##! Moses - install srilm * find out machine type, e.g. i686 ./sbin/machine-type * if i686, edit the srilm/sbin/machine-type script and change else if (`uname -m` == x86_64) then #set MACHINE_TYPE = i686-m64 set MACHINE_TYPE = i686 to else if (`uname -m` == x86_64) then set MACHINE_TYPE = i686-m64 #set MACHINE_TYPE = i686 * compile make MACHINE_TYPE=i686-m64 World (Note that if the machine type is i686 only, the compiler will expect gnu/stubs-32.h, while your system might only have /usr/include/gnu/stubs-64.h) * if error "/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory" install glibc-devel (http://www.cyberciti.biz/faq/x86_64-linux-error-gnustub-32h-missing-error-and-solution/) (if necessary edit common/Makefile.machine.i686, to add ADDITIONAL_INCLUDES = -I/home/lmthang/software/include) ##! Scheme - hash table keys (display (hash-table-keys preterminals)) http://srfi.schemers.org/srfi-69/srfi-69.html ##! Scheme - handle command-line * procedure command-line http://www.gnu.org/software/guile/manual/html_node/Runtime-Environment.html * srfi-1 http://srfi.schemers.org/srfi-1/srfi-1.html * example: check if there is a 6th argument, if true, set is-debug flag (display "# Args: ")(disp (command-line)) (when (> (length (command-line)) 5) ;; get debug flag (let ([debugflag (sixth (command-line))]) (when (string=? debugflag "true") (set! is-debug #t)))) ##! Python - string into char array list(s) ##! Unix - install glibc git clone git://sourceware.org/git/glibc.git cd glibc git checkout --track -b glibc-2_10-branch origin/release/2.10/master * install glibc-devel "configure: error: you must configure in a separate build directory" call configure script outside the glibc dir: mkdir thang-build cd thang-build ../configure --prefix=/home/lmthang/software when make install, if it doesn't find ld.so.conf, copy from /etc/ld.so.conf into the local dir/etc * to compile 32-bit ../configure --prefix=/home/lmthang/software --host=i686-linux-gnu --build=i686-linux-gnu CC="gcc -m32" CXX="g++ -m32" CFLAGS="-O2 -march=i686" CXXFLAGS="-O2 -march=i686" http://stackoverflow.com/questions/8004241/how-to-compile-glibc-32bit-on-an-x86-64-machine --host=x86_64-pc-linux-gnu --build=i386-pc-linux-gnu ##! Unix -- wget download with wildcard wget -r -l1 --no-parent -A "*bg-mk*" http://opus.lingfil.uu.se/experiments/data/mk-en ##! Python -- check dir and create def check_dir(dir): if os.path.isdir(dir) == False: sys.stderr.write('Directory %s does not exist, creating ... \n ') os.mkdir(dir) ##! Matlab -- cell array to string vocabulary cell array sentenceStr = strcat(vocabulary{:}); fprintf('# Parse %s\n', sentenceStr); ##! Matlab -- getting values of multiple keys from container a = containers.Map({'a', 'b'}, num2cell(1:1:2)); keys = {'a' 'b' 'c'}; indices = find(isKey(a, keys) == 1); b = cell2mat(values(a, keys(indices))) results = zeros(1, length(keys)); results(indices) = b; ##! Matlab -- convert to cell cellstr: for string array num2cell: for num array ##! Matlab -- check file exists if exist(morphemeFile, 'file') ~= 2 continue; end ##! Matlab -- concatenate maps a = containers.Map({'a', 'b'}, num2cell(1:1:2)); a.keys a.values b = containers.Map({'c', 'd'}, num2cell(3:1:4)); b.keys b.values c = [a; b]; c.keys c.values ##! Matlab -- container map numRefSegments = 3; refValues = [4 5 6]; newMap = containers.Map(num2cell(refValues), num2cell(1:1:numRefSegments)) ##! Matlab -- split string regexp(str, '\t', 'split'); ##! Matlab -- check directory exists % check and create dir if exist(outDir, 'dir') ~= 7 % directory does not exist fprintf(1, '# Creating out dir %s ... \n', outDir); mkdir(outDir); end ##! Matlab -- print cell sprintf('%s, ', A{:}) fprintf(1, '# Good word=%s: %s\n', params.characterArray(tNum), sprintf('\"%s\" ', treeGood.nodeNames{:})); ##! Python - Windows setup * download from http://www.python.org/download/releases/, and install * set system environment: PYTHONPATH: C:\Python32\Lib PATH: C:\Python32 ##! R - check if an attribute exists http://stackoverflow.com/questions/1177919/does-column-exist-and-how-to-rearrange-columns-in-r-data-frame "DLTIC" %in% colnames(dundee) ##! bash - append array http://tldp.org/LDP/abs/html/arrays.html dest=( ${array1[@]} ${array2[@]} ) ##! bash - string tokenization http://stackoverflow.com/questions/5382712/bash-how-to-tokenize-a-string-variable $ string="john is 17 years old" $ for word in $string; do echo "$word"; done john is 17 years old ##! bash - condition if http://www.linuxtutorialblog.com/post/tutorial-conditions-in-bash-scripting-if-statements if [ $foo -ge 3 -a $foo -lt 10 ]; then -a: and -o: or ##! Latex - convert powerpoint to eps http://people.csail.mit.edu/dalleyg/faqs/20010425.html http://www.iml.ece.mcgill.ca/~stephan/ooeps * Create new slide, in Design/Layout, set size equals to size of the image, print pdf ** pdftops -eps Presentation1.pdf ** fix bounding box cat Presentation1.eps | ~/software/ps2eps/bin/ps2eps > Presentation2.eps ##! Bash --- for through array a=("c" "b" "a") size=${#a[@]} echo "size=$size" for i in 0 1 2; do echo "Thang ${a[$i]}" done for i in $(seq 0 $(($size-1))); do echo "Thang1 ${a[$i]}" done ##! Eclipse --- Install Profiling tootl http://www.eclipse.org/tptp/home/downloads/4.7.0/documents/installguide/InstallGuide.html * Add Helios repository http://download.eclipse.org/releases/helios/ * Expand the Web, XML, and Java EE Development entry in the Helios update site * Select the Eclipse Web Developer Tools option. * Complete Installation. Restart Eclipse when prompted. ##! Python --- LISP style functions 8.12 List Comprehensions - "Core Python Programming 2nd Edition" * map map(lambda x: x ** 2, range(6)) [0, 1, 4, 9, 16, 25] or [x ** 2 for x in range(6)] [0, 1, 4, 9, 16, 25] [expr for iter_var in iterable if cond_expr] * filter >>> seq = [11, 10, 9, 9, 10, 10, 9, 8, 23, 9, 7, 18, 12, 11, 12] >>> filter(lambda x: x % 2, seq) [11, 9, 9, 9, 23, 9, 7, 11] or >>> [x for x in seq if x % 2] [11, 9, 9, 9, 23, 9, 7, 11] * matrix [(x+1,y+1) for x in range(3) for y in range(5)] * count the number of words len([word for line in f for word in line.split()]) * count the number of characters sum([len(word) for line in f for word in line.split()]) ##! Python --- sort list >>> a = {'a':4, 'd':1, 'c':2, 'b':5} >>> sorted(a.items()) [('a', 4), ('b', 5), ('c', 2), ('d', 1)] >>> sorted(a.items(), key=lambda x: x[0]) [('a', 4), ('b', 5), ('c', 2), ('d', 1)] >>> sorted(a.items(), key=lambda x: x[1]) [('d', 1), ('c', 2), ('a', 4), ('b', 5)] >>> sorted(a.items(), key=lambda x: x[1], reverse=True) [('b', 5), ('a', 4), ('c', 2), ('d', 1)] ##! R --- workspace variables http://ww2.coastal.edu/kingw/statistics/R-tutorials/savingloading.html > ls() # This is my default workspace from My Documents. [1] "age.out" "m.ex" "m.ex.minussex" "outcome.out" [5] "respiratory" "seizure" "visit.matrix" > setwd("Rspace") # First change, if you haven't already. > rm(list=ls()) # Delete the default workspace. > load(".RData") # Load a previously saved workspace. > loadhistory() # Load a previously saved history file. > ls() [1] "my.data" ##! Python - regular expression sub pattern = re.compile(''.join(['([', string.punctuation, ']+)'])) ref_line = pattern.sub(r' \1', ref_line) ##! Java - set path export JAVA_HOME=/u/nlp/packages/java/jdk1.6.0_27-x86_64 export PATH=${PATH}:${SOFTWARE}/bin:/u/nlp/packages/apache-ant/bin:${JAVA_HOME}/bin Important avoid: export JAVA_HOME=${JAVA_HOME}:/u/nlp/packages/java/jdk1.6.0_27-x86_64 ##! R --- install non-root R CMD INSTALL lme4_0.999375-42.tar.gz -l /home/lmthang/software/R add R_LIBS environment R_LIBS=/home/USER/R export R_LIBS ##! Unix -- find unique if you want the ones that are in file1, but not file2, you can do this: cat file1 file2 file2 | sort | uniq -c| egrep "^ +1"|wc ##! R --- R scripting using here document (suggested by Fran¸cois Pinard) #!/bin/sh # [environment variables can be set here] inFile=$1 outFile=$2 R --slave --args $inFile $outFile <0} P_{\mbox{\small katz}}(w_k|w_1^{k-1}) + \sum_{w_k:c(w_1^{k})=0} P_{\mbox{\small katz}}(w_k|w_1^{k-1}) \\ &= \sum_{w_k:c(w_1^{k})>0} P^*(w_k|w_1^{k-1}) + \alpha*\left[\sum_{w_k:c(w_1^{k})=0} P_{\mbox{\small katz}}(w_k|w_2^{k-1})\right] \\ &= \sum_{w_k:c(w_1^{k})>0} P^*(w_k|w_1^{k-1}) + \alpha*\left[1-\sum_{w_k:c(w_1^{k})>0} P^*(w_k|w_2^{k-1})\right] \\ \end{align*} ##! Latex --- multiline equations http://www.math.uiuc.edu/~hildebr/tex/displays.html \begin{equation} \label{e:katz_general} P_{\mbox{katz}}(w_k|w_1^{k-1}) = \begin{cases} P^*(w_k | w_1^{k-1}) & \text{if } C(w_1^{k})>0 \\ \alpha(w_1^{k-1}) P_{\mbox{\small katz}}(w_k|w_2^{k-1}) & \text{else if } C(w_1^{k-1})>0 \\ P_{\mbox{\small katz}}(w_k|w_2^{k-1}) & \text{otherwise } \end{cases} \end{equation} "align" environment: put "\notag" right before the linebreak symbol ("\\") on all other lines. On lines that are to be numbered, you can put the label command, "\label{...}", before the linebreak. With "align*", it works the other way: By default, none of the lines gets numbered, so to number a particular line you must put an explicit "\tag{...}" command at the end of that line, before the linebreak symbol. ##! Vim --- search and replace http://vim.wikia.com/wiki/Search_and_replace :%s/foo/bar/g Find each occurrence of 'foo', and replace it with 'bar'. :%s/foo/bar/gc Change each 'foo' to 'bar', but ask for confirmation first. :%s/\/bar/gc Change only whole words exactly matching 'foo' to 'bar'; ask for confirmation. :%s/foo/bar/gci Change each 'foo' (case insensitive) to 'bar'; ask for confirmation. :%s/foo/bar/gcI Change each 'foo' (case sensitive) to 'bar'; ask for confirmation. The g flag means global – each occurrence in the line is changed, rather than just the first. ##! Latex --- include image \includegraphics[width=1\textwidth, clip=true, trim = 50 0 50 0]{simpleGT.\imgExt} ##! Vim --- indent code block http://vim.wikia.com/wiki/Indent_a_code_block =i{ reindents "inner block" (inside the braces). =a{ reindents "a block" (including the braces). =2a{ reindents 2 blocks (this block and containing block). ##! R --- c function (mnemonic for combine) generate vector x = c(12, 19, 22, 15, 12) ##! R --- gl generate factor levels http://www.math.montana.edu/Rweb/Rhelp/gl.html gl(n, k, length = n*k, labels=1:n, ordered=FALSE) The result has levels from 1 to n with each value replicated in groups of length k # First control, then treatment: gl(2,8, label=c("Ctnrl","Treat")) # 20 alternating 1s and 2s gl(2, 1, 20) # alternating pairs of 1s and 2s gl(2, 2, 20) ##! Unix --- tcsh command history http://tomecat.com/jeffy/tttt/cshhistory.html https://itservices.stanford.edu/service/unixcomputing/unix/unixcomm history List all commands typed so far (default maximum number=20) !! Repeat the last command !n Repeat command n from the history list !PATTERN Repeat last command beginning with PATTERN ^PATTERN1^PATTERN2 Repeat last command but replace PATTERN1 (usually a typo) with PATTERN2 (the correction) ##! Unix --- .tcshrc shell environment file http://members.tripod.com/~Rumman_Gaffur/unix/tcshrc.html http://www.sumedh.info/downloads/cshrc-tcshrc-login-alias-file.php # Get the aliases and functions alias ls 'ls --color=yes' set prompt="`whoami`@%S%m%s %~ > " # User specific environment and startup programs setenv CS224N /afs/ir/users/l/m/lmthang/cs224n setenv PA1 ${CS224N}/pa1/java setenv CLASSPATH ${PA1}/classes http://howto.unixdev.net/tcshrc.html set autolist set color set colorcat ##! Unix --- check current shell echo $0 ##! Unix --- SSH using public key ssh-keygen -t dsa ssh-copy-id -i .ssh/id_dsa.pub user@hostname ##! Vim --- shift block of code http://vim.wikia.com/wiki/VimTip224 visual mode, select text, then either "<<" or ">>" ##! Unix --- Merge PDF file gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=out.pdf *.pdf ##! Vim --- keep indentation while pasting code http://vim.wikia.com/wiki/Toggle_auto-indenting_for_code_paste * In .vimrc set pastetoggle= * insert mode -> F11 -> paste code -> F11 (turn off paste mode) ##! Vim --- vim settings (tab, space) http://www.slackorama.com/projects/vim/vimrc.html http://www.jwz.org/doc/tabs-vs-spaces.html * In .vimrc set softtabstop=4 " interpret tab as an ``indent'' command instead of an insert-a-tab command set shiftwidth=4 " number of spaces to use for auto indent set tabstop=4 " Tab charactered file to be displayed as spaces set expandtab " enter spaces when tab is pressed ##! Java --- jar file and compilation * check jar content jar tf retroweaver-1.1.jar | grep runtime * extract jar content jar xf stanford-corenlp-2011-06-19-sources.jar * Compile java src javac -cp "/home/lmthang/RA/java/*" -sourcepath "src" -d prefixparser\ src/edu/stanford/nlp/*/*.* * build.xml ** Sample build.xml http://ant.apache.org/manual/using.html ** Add classpath (http://www.adp-gmbh.ch/java/ant/build_xml.html) ** Copy application resources ! Aug 2011 ##! Python --- Install modules as non-root users http://packages.python.org/distribute/easy_install.html http://docs.python.org/install/index.html * install Virtual python (http://pypi.python.org/pypi/virtualenv) wget http://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.6.4.tar.gz#md5=1072b66d53c24e019a8f1304ac9d9fc5 ** In virtualen-1.6.4 directory: python virtualenv.py /home/lmthang/software/python * export path in .bash_profile export PYTHONDIR=/home/lmthang/software/python export PATH=${PATH}:${PYTHONDIR}/bin * set python alias in .bashrc alias python='$PYTHONDIR/bin/python' * exit, login again, run easy_install easy_install argparse ##! Emacs --- Turn off backup files http://www.rpi.edu/dept/acs/rpinfo/common/Computing/Consulting/Software/Emacs/Hints/backup.html In .emacs, (setq make-backup-files nil) ##! Vim --- Timestamp http://vim.wikia.com/wiki/VimTip97 http://stackoverflow.com/questions/56052/best-way-to-insert-timestamp-in-vim :r! date produces: Thu Sep 11 10:47:30 CEST 2008 :r! date "+%Y-%m-%d %H:%M:%S" produces: 2008-09-11 10:50:56 ##! Ubuntu --- Fixing locale Ubuntu perl http://blog.thinkside.co.uk/?p=231 sudo /etc/default/locale, add LANGUAGE="en_GB.UTF-8" LC_ALL="en_GB.UTF-8" sudo dpkg-reconfigure locales