Date Tags software

Last update: Thu Jan 19 16:16:08 CET 2017

This is a brief note about what to do to get mateplus running for the purpose of framenet-style semantic role labeling.

NB: the info below may not be complete as it is reconstructed from memory after I got mateplus running. Please let me know if anything is missing so I can update this.

Download the models

  • Personally, I put all the stuff into one sister directory of mateplus (which i called “mateplus_companion_software”).
  • Under mateplus, I then created two symlinks “lib” and “models”, both pointing to the dir for the companion software so that mateplus can find the stuff.

Install basic components

Make sure you have a machine where you have permissions to open ports on so that you can install the components that mateplus relies on.

Below, I am assuming that mateplus and the software that is accessed in server mode runs on the same machine , i.e. ‘localhost’.

  • install semafor 3 (alpha version): we need the server mode that this newer version of semafor provides

  • install semafor 2-1 : we need the server mode of the MST parser that this version of semafor provides

set up MST for server mode

To get MST parser (included in semafor 2-1) ready for the server mode that semafor 2.1 provides, go into the “release” directory of semafor 2-1. In there, modify the “config” file such that it uses the MST mode

  • set MST_MACHINE=localhost (or specify IP)
  • set the port number as needed
  • set “MST_MODE=server”

Then, under the “release” directory of semafor 2-1, run the bash script “startMSTServer.sh”

start semafor 3

  • edit the file “bin/config.sh” to set paths
  • start the server mode : adjust paths , ports and memory in the command as needed
  • note the port number: you will need it later for the setup of mateplus

java -Xms6g -Xmx6g -cp target/Semafor-3.0-alpha-04.jar edu.cmu.cs.lti.ark.fn.SemaforSocketServer model-dir:/mnt/medium/tools/maltparser/semafor_malt_model port:8043

stanford corenlp

Use stanford corenlp version 3.7: the mateplus readme online mentions “3.x.x” but at this point no other versions >= 3..0.0 work if you want to use the frame semantic parsing scripts.

adjust “glove”

  • I downloaded and extracted glove into a subdir of “mateplus_companion_software”.

  • I modified the original glove’s dir “processtmp.sh” file, adjusting some variables

GLOVEDIR=/mnt/medium/tools/mateplus_companion_software/glovey
SCALE_FILE=/mnt/medium/tools/mateplus_companion_software/glovey/scaleparams.txt
SAVE_FILE=$1.vectors // in hind-sight not quite sure if this setting is what’s needed
MEMORY=10.0 // set depending on your hardware: the implicit unit assumed here is GB!

NB: a brief caveat: So far I have only run mateplus on mini example files with two sentences or so. However, i’ve noticed that the temp files for cooccurrences (coocs) and shufflings (coocs.shuf) that glove creates in the tmp dir are always empty. I am not sure if this is suspicous and means that glove is not correctly configured.

-rw-rw-r— 1 ruppenho ruppenho 0 Jan 19 22:55 glv3829134863266861581.txt.coocs
-rw-rw-r— 1 ruppenho ruppenho 0 Jan 19 22:55 glv3829134863266861581.txt.coocs.shuf

adjust run scripts

adjust the bash files under “scripts “, fix paths and/or file names, set the ports so they match the specification in the semafor 2-1 and semafor-3 config files

— contents of my file “parse-framenet.sh” —

JAVA=/etc/alternatives/java

# directory to which fndata-1.5 was extracted
# the dir we mean should have as its immediate subdirs : docs, frame, fulltext , schema, lu FRAMENETDIR=/mnt/big/lexicons/FN/fn.r1.5
# matetools models downloaded from web
LEMMA_MODEL=models/CoNLL2009-ST-English-ALL.anna-3.3.lemmatizer.model POS_MODEL=models/CoNLL2009-ST-English-ALL.anna-3.3.postagger.model PARSER_MODEL=models/CoNLL2009-ST-English-ALL.anna-3.3.parser.model

# directory in which glove10a was extracted
# (please also modify processtmp.sh in glove dir accordingly!)
GLOVEDIR=/mnt/medium/tools/mateplus_companion_software/glovey

$JAVA -Xmx8g -cp lib/*:mateplus.jar se.lth.cs.srl.CompletePipeline eng \
-lemma $LEMMA_MODEL \
-parser $PARSER_MODEL \
-tagger $POS_MODEL \
-srl models/srl-TACL15-eng.model \
-glove $GLOVEDIR \
- reranker -globalFeats -semafor “127.0.0.1 8043” -mst “127.0.0.1 12345” -framenet $FRAMENETDIR -test $1

— end contents of “parse-framenet.sh”

adjust processtmp.sh in glove installation

As a reference, here is my processtmp.sh file from the glove installation:

—- processtmp.sh —-

#!/bin/bash GLOVEDIR=/mnt/medium/tools/mateplus_companion_software/glovey

CORPUS=$1
VOCAB_FILE=$1.vocab
COOCCURRENCE_FILE=$1.coocs
COOCCURRENCE_SHUF_FILE=$1.coocs.shuf
LOAD_FILE=$1.init
GRADSQ_FILE=$1.grad
SCALE_FILE=/mnt/medium/tools/mateplus_companion_software/glovey/scaleparams.txt
SAVE_FILE=$1.vectors

#SAVE_FILE=tmp/$1
VERBOSE=2
MEMORY=10.0
VOCAB_MIN_COUNT=0
VECTOR_SIZE=50
MAX_ITER=10
WINDOW_SIZE=10
BINARY=0
NUM_THREADS=1
X_MAX=100

echo “Corpus “.$CORPUS
#$GLOVEDIR/vocab_count -min-count 0 -verbose $VERBOSE < $CORPUS > $1.vocab
$GLOVEDIR/cooccur++ -memory $MEMORY -vocab-file $VOCAB_FILE -verbose $VERBOSE -window-size $WINDOW_SIZE < $CORPUS > $COOCCURRENCE_FILE
# ./cooccur -memory $MEMORY -vocab-file $VOCAB_FILE -verbose $VERBOSE -window-size $WINDOW_SIZE < $CORPUS > $COOCCURRENCE_FILE
echo $COOCCURRENCE_FILE

#$GLOVEDIR/cooccur -memory $MEMORY -vocab-file $VOCAB_FILE -verbose $VERBOSE -window-size $WINDOW_SIZE < $CORPUS > $COOCCURRENCE_FILE
# ./cooccur -verbose 2 -symmetric 0 -window-size 10 -vocab-file vocab.txt -memory 8.0 -overflow-file tempoverflow < corpus.txt > cooccurrences.bin

if [[ $? -eq 0 ]]
then
$GLOVEDIR/shuffle -memory $MEMORY -verbose $VERBOSE < $COOCCURRENCE_FILE > $COOCCURRENCE_SHUF_FILE
if [[ $? -eq 0 ]]
then
#-load-gradsq $GRADSQ_FILE -load-file $LOAD_FILE
$GLOVEDIR/glove -save-file $SAVE_FILE -threads $NUM_THREADS -input-file $COOCCURRENCE_SHUF_FILE -x-max $X_MAX -iter $MAX_ITER -vector-size $VECTOR_SIZE -binary $BINARY -vocab-file $VOCAB_FILE -verbose $VERBOSE -model 0
echo “now off to perl”
perl $GLOVEDIR/removeBiasCenterAndScale.pl $SAVE_FILE.txt $SCALE_FILE $1.vocab > $1.vectors
fi
fi

—- end contents of processtmp.sh —-

Companion software

Just for completeness’ sake, I list the downloaded models and libraries that are in my companion software dir. Note: this dir (accidentally) contains files extracted from the glove tar ball. However, i re-extracted that tar-ball inside the dir “glovey” again - those files are what I used. In particular, I modified the processtmp.sh file contained in that dir.

I am using slightly older versions of opennlp-maxent and opennlp-tools that I already had. I assume the newer versions that Michael Roth specified in his readme would work as well. Some of my files are just symlinked, which is ok.

  • anna-3.3.jar
  • CoNLL2009-ST-English-ALL.anna-3.3.lemmatizer.model*
  • CoNLL2009-ST-English-ALL.anna-3.3.parser.model*
  • CoNLL2009-ST-English-ALL.anna-3.3.postagger.model*
  • CoNLL2009-ST-English-ALL.anna-3.3.srl-4.1.srl.model*
  • cooccur*
  • cooccur++*
  • cooccur.c
  • cooccur++.c
  • demo.sh*
  • de-token.bin
  • eval/
  • glove*
  • glove.c
  • glovey/
  • gradsq.txt
  • lemma-ger-3.6.model*
  • liblinear-1.51-with-deps.jar
  • liblinear-java-1.95.jar
  • LICENSE
  • makefile
  • opennlp-maxent-3.0.2-incubating.jar
  • opennlp-tools-1.5.2-incubating.jar
  • processtmp.sh*
  • README
  • removeBiasCenterAndScale.pl
  • scaleparams.txt
  • shuffle*
  • shuffle.c
  • srl-EMNLP14+fs-eng.model*
  • srl-EMNLP14+fs-extger.model*
  • srl-EMNLP14+fs-ger.model*
  • srl-TACL15-eng.model*
  • stanford-corenlp-3.7.0.jar -> /mnt/medium/tools/stanfordCoreNLP/stanford-corenlp-full-2016-10-31/+stanford-corenlp-3.7.0.jar
  • stanford-corenlp-3.7.0-javadoc.jar -> /mnt/medium/tools/stanfordCoreNLP/stanford-corenlp-full-2016-10-31/stanford-corenlp-3.7.0-javadoc.jar
  • stanford-corenlp-3.7.0-models.jar -> /mnt/medium/tools/stanfordCoreNLP/stanford-corenlp-full-2016-10-31/stanford-corenlp-3.7.0-models.jar
  • stanford-corenlp-3.7.0-sources.jar -> /mnt/medium/tools/stanfordCoreNLP/stanford-corenlp-full-2016-10-31/stanford-corenlp-3.7.0-sources.jar
  • transition-1.30.jar
  • vectors.txt
  • vocab_count*
  • vocab_count.c
  • vocab.txt
  • whatswrong-0.2.3.jar