Home:ALL Converter>Java Stanford NLP: Find word frequency?

Java Stanford NLP: Find word frequency?

Ask Time:2009-11-30T05:14:39         Author:Nick Heiner

Json Formatter

I'm using the Stanford NLP Parsing toolkit. Given a word in the lexicon, how can I find its frequency*? Or, given a frequency rank, how can I determine the corresponding word?

*in the entire language, not just the text sample.

This is a demo of the toolkit I'm using:

class ParserDemo {
  public static void main(String[] args) {
    LexicalizedParser lp = new LexicalizedParser("englishPCFG.ser.gz");
    lp.setOptionFlags(new String[]{"-maxLength", "80", "-retainTmpSubcategories"});

    String[] sent = { "Sincerity", "may", "frighten", "the", "boy", "." };
    Tree parse = (Tree) lp.apply(Arrays.asList(sent));
    parse.pennPrint();
    System.out.println();

    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
    Collection tdl = gs.typedDependenciesCollapsed();
    System.out.println(tdl);
    System.out.println();

    TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
    tp.printTree(parse);
  }

}

Author:Nick Heiner,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/1816800/java-stanford-nlp-find-word-frequency
Stompchicken :

If you are only counting word frequencies, sentence parsing is unnecessary. All you need to do is tokenise the input and then count word frequencies using a java HashMap. If you want to use the Stanford tools, then use any of the tokenisers in edu.stanford.nlp.process.\n\nThis gives you the frequency of any given word, but in general it may not be possible to find the word corresponding to a given frequency rank, since some words may be equally frequent in the document.",
2009-12-01T11:42:09
yy