| NEWS | R Documentation |
News for Package 'tm.plugin.koRpus'
Changes in tm.plugin.koRpus version 0.4-2 (2021-05-17)
fixed
updated test standards after changes to koRpus' internal calculations of numer of lines in texts imported from TIF data frames
changed
kRp.corpus: replaced
prototype()in class definition with initialize method
Changes in tm.plugin.koRpus version 0.4-1 (2020-12-17)
fixed
-
docTermMatrix(): results were wrong because numbers were assigned to wrong columns; now fixed in koRpus unit tests failed on windows due to an UTF-8 issue
changed
the nested object class kRp.hierarchy was replaced by kRp.corpus; instead of reproducing the file hierarchy in the object structure, kRp.corpus has a flat structure with all texts in one single data frame; this data frame was also renamed from
"TT.res"into"tokens"the class name kRp.corpus was used in tm.plugin.koRpus before and is just being recycled ;) kRp.corpus inherits from class kRp.text as defined in the koRpus packagestatus messages are currently only shown when only one CPU is used
-
corpusTagged(): now calledtaggedText()as in koRpus -
corpusDesc(): now calleddescribe()as in koRpus [, [<-, [[ and [[<- methods no longer apply to the summary data frame but tokens slot as in koRpus (where it applies to the TT.res slot)
-
show(): kRp.corpus objects now list all available features -
read.corp.custom(): removed unused mc.cores argument -
docTermMatrix(): by default behaves like most other methods and adds its result to the input object rather than returning just the matrix; also, the generic is now defined by the koRpus package and was removed, including all of the actual function code adjusted unit tests and vignette
updated all examples to use a new sample corpus (see added), to the benefit that many "\dontrun{}" cases could be removed
added
-
readCorpus(): the hierarchy levels of a text corpus can now be assumed directly from the directory structure by setting "hierarchy=TRUE" -
corpusHasFeatures(),corpusHasFeatures()<-,corpusFeatures(),corpusFeatures()<-,corpusHierarchy(),corpusHierarchy()<-,corpusCorpFreq(),corpusCorpFreq()<-,diffText(),diffText()<-,originalText(): new getter/setter methods for kRp.corpus objects -
split_by_doc_id(): new method transforms a kRp.corpus object into a list of kRp.text objects -
corpusDocTermMatrix(): new method to get/set the sparse document term matrix in kRp.corpus objects [[/[[<-: gained new argument
"doc_id"to limit the scope to particular documents-
describe()/describe()<-: now support filtering by doc_id new sample corpus for use in examples
removed
removed all classes and methods dealing with kRp.hierarchy
removed deprecated methods of the pre-kRp.hierarchy era
removed generic of
tif_as_tokens_df()as it was moved to the koRpus package
Changes in tm.plugin.koRpus version 0.3-1 (2019-05-14)
fixed
-
readCorpus(): solved a cryptic warning when more than one text was tokenized
added
-
docTermMatrix(): new method to generate document-term matrices, either with absolute frequencies or tf-idf values -
query(): new method, extending the generic of koRpus >= 0.12-1 -
filterByClass(): new method, extending the generic of koRpus >= 0.12-1 -
jumbleWords(): new method, extending the generic of koRpus >= 0.12-1 -
clozeDelete(): new method, extending the generic of koRpus >= 0.12-1 -
cTest(): new method, extending the generic of koRpus >= 0.12-1 -
textTransform(): new method, extending the generic of koRpus >= 0.12-1 -
show(): new method for objects of class kRp.hierarchy
changed
depends on koRpus >= 0.12-1 now
depends on the Matrix package now (for
docTermMatrix())adjusted test standards to include the additional POS tags from koRpus >= 0.12-1
Changes in tm.plugin.koRpus version 0.02-2 (2019-01-18)
fixed
-
readCorpus(),kRpSource(): added missing imports from packages tm, NLP and parallel -
readCorpus(): fixed status message formatting -
corpusTm(): removed useless"level"argument and corrected the output -
readCorpus(): removed unused"level"argument -
corpusFiles(): now also works with flat hierarchy objects
added
-
readCorpus(): can now also import data frames in TIF format, including support for hierarchal categories -
tif_as_corpus_df(): new S4 method to transform a kRp.hierarchy object into a TIF compliant data frame
changed
-
readCorpus(): the tm corpora now include full hierarchy metadata removed pre-hierarchy portions from internal function
whatIsAvailable()
Changes in tm.plugin.koRpus version 0.02-1 (2018-07-29)
changed
vignette: also includes info on
readCorpus()tests: adjusted test standards to new object class
added
kRp.hierarchy: new S4 class to replace kRp.sourcesCorpus and kRp.topicCorpus to allow more generic nesting of hierarchical levels
-
readCorpus(): new function to generate kRp.hierarchy objects recursively many corpus*() getter functions can now filter by hierarchy level or category ID
removed all code regarding
simpleCorpus(),sourcesCorpus()andtopicCorpus(), their object classes and methods; this is all handled much more flexible by kRp.hierarchy andreadCorpus()now
Changes in tm.plugin.koRpus version 0.01-4 (2018-03-07)
fixed
-
sourcesCorpus(): speak of"text"instead of"texts"if it's only one
changed
adjusted package to support koRpus >= 0.11 and sylly, especially with regards to
summary(),hyphen(), and new class contructors-
summary(): for more coherence with the koRpus package the"text"column in the summary slot was renamed into"doc_id" reaktanz.de supports HTTPS now, updated references
vignette is now in RMarkdown/HTML format; the SWeave/PDF version was dropped
-
hyphen()/lex.div()/readability(): 'quiet' is now TRUE by default -
lex.div(): 'char' is now an emtpy string by default; computing all characteristics was not a useful default for large text corpora
added
README.md
new [, [<-, [[ and [[<- methods added for corpus object classes
new methods
tif_as_tokens_df()to export corpus objects as a single data.frame in fully TIF compliant format-
summary(): now also includes the total number of stopwords (if available) new class object contructors
kRp_corpus(),kRp_sourcesCorpus(), andkRp_topicCorpus()can be used instead of new("kRp.corpus", ...) etc.
Changes in tm.plugin.koRpus version 0.01-3 (2016-07-12)
fixed
the arguments that
simpleCorpus()was supposed to pipe toDirSource()weren't used
changed
the
"paths"argument oftopicCorpus()now expects a list, not a vectorusing the parallel package to be able to use more CPU cores
added
new argument
"format"forsimpleCorpus(),sourceCorpus(), andtopicCorpus(), to be able to work with text objects directly, instead of files
Changes in tm.plugin.koRpus version 0.01-2 (2015-07-08)
changed
using the S4 methods of koRpus 0.06-1 now, therefore renamed all methods removing the *.corpus suffix (e.g.,
lex.div.corpus()is nowlex.div())renamed classes into kRp.corpus, kRp.sourcesCorpus and kRp.topicCorpus, and their generator functions accordingly
added
new methods
read.corp.custom(),freq.analysis()andsummary()new getter/setter methods:
corpusSources(),corpusTopics(),corpusFreq(),corpusSummary()first basic unit tests, using the testthat package
new option
"summary"forlex.div()andreadability(), to automatically update the summary data.framesfirst notes in a vignette
Changes in tm.plugin.koRpus version 0.01-1 (2015-06-29)
added
initial release