Some of the essential text mining algorithms were implemented as Web services. Two Web services are intended for text filtering: StopWordsRemover and CharacterFilter, two Web services are dealing with linguistic morphology: a lemmatizator named LemmaGen and a stemmer named PorterStemmer and one Web service is a text format converter named GenerateBows. An auxiliary Web service named getValues was developed for providing a list of possible parameter values of a Web service parameter, which is used to provide user interface to Web services with parameters with several parameter values.
StopWordsRemover
operation: StopWordsRemover
WSDL: http://zulu.ijs.si:8086/SW_service?wsdl
Description: This operation takes as input plain text and a dictionary of stop words. It removes the stop words from the input text.
LemmaGen
operation: LemmaGen
WSDL: http://zulu.ijs.si:8086/LM_service?wsdl
Description: This operation lemmatizes the input text according to the language parameter. Currently, 12 languages are supported: en,sl,ge,bg,cs,et,fr,hu,ro,sr,it,sp. It returns (language dependent) lemmatized text as output. All the words in the resulting text are in the same order as in the original text, but they are transformed to their dictionary form.
PorterStemmer
operation: PorterStemmer
Wsdl: http://zulu.ijs.si:8086/PS_service?wsdl
Description: This operation does text stemming. Stemming removes the inflicted endings of words. It is often used as text preprocessing for text mining, since stemmed words can be easily matched and counted. The input to this operation is the text to be stemmed, the output is the stemmed text.
GenerateBows
operation: GenerateBows
WSDL: http://bison.ijs.si/WebServices/TextNet.svc?wsdl
Description: BOW construction is a document corpora processing task as it transforms a corpus of documents into a Bag-Of-Words format. In this format, each document is represented as an unordered collection of words, disregarding grammar and even word order. There are several preprocessing options and parameters that can be set to this service.
- Stemmer: Lemmatizer_Bulgarian, Lemmatizer_Czech,
Lemmatizer_English, Lemmatizer_Estonian, Lemmatizer_French,
Lemmatizer_German, Lemmatizer_Hungarian, Lemmatizer_Italian,
Lemmatizer_Romanian, Lemmatizer_Serbian, Lemmatizer_Slovene,
Lemmatizer_Spanish, PorterStemmer, None - StopWordSets: English, EnglishGoogle, English523, English425,
English319, English8, EnglishInet, French, German, Spanish,
Slovene, Empty - Tokenizer: UnicodeTokenizer, VocabularyTokenizer
- WordWeightType: TermFreq, TfIdf, LogDfTfId
getValues
operation: getValues
WSDL: http://ropot.ijs.si/webservices/janez/getvalues.php?wsdl
Description: This operation parses the web service wsdl description and return a list of possible parameter values for the inputed parameter name.

In addition to the services listed above the e-LICO text mining Web Services can provide:
- Text cleaning
- PDF to text conversion
- Text classification
- Sentence splitting
- Biologically relevant entity recognition
- Biologically relevant relationship detection
The majority of e-LICO services are listed on BioCatalogue
Here is a short summary of the Web Service operations available. For more information please follow the BioCatalogue link)
Text cleaner (BioCatalogue:2173)
operation: cleanText
This operation will remove all XML-invalid characters from the text supplied. Valid XML characters are specified here http://www.w3.org/TR/REC-xml/#charsets
operation: cleanTextASCII
This operation will remove all XML-invalid and non-ASCII characters from the text supplied. This operation can be used to clean text so that it is suitable as input for the NaCTeM service TerMine (http://www.biocatalogue.org/services/32-termine_35834), which only accepts ASCII text. XML-invalid characters are specified here (http://www.w3.org/TR/REC-xml/#charsets). ASCII characters are defined as having a Unicode code point between 0000 and 007F.
PDF to text (BioCatalogue:2172)
operation: pdfToText
This operation accepts a byte array representation of a PDF file and returns a byte array representation of the extracted text
operation: pdfToTextBase64
This operation accepts a Base64 encoded string representation of a PDF file and returns a Base64 encoded representation of the extracted text (a string)
Article section text classifier (BioCatalogue:2171)
operation: classifyText
This operation will classify a piece of text as being most likely to come from one of the four common scientific article sections (Introduction, Methods, Results, Discussion). This is a document-type web service, and this operation accepts a single string as input (the text to be classified). If you want to use this operation in Taverna, then you should use an XML input and output splitter.
operation: classifyTextDetailed
This operation will classify a piece of text as being most likely to come from one of the four common scientific article sections (Introduction, Methods, Results, Discussion). This is a document-type web service, and this operation accepts a single string as input (the text to be classified). If you want to use this operation in Taverna, then you should use an input XML splitter and a chain of two output XML splitters.
Sentence splitter service (BioCatalogue:2161)
operation: splitIntoSentences
This is the only operation it accepts a single string and returns an array of strings. Both the input and output are wrapped up in an XML document. To get access to the input and output data in Taverna, please add an "XML Input Splitter" and an "XML Output Splitter" after adding the operation to your workflow.
Finding things service (BioCatalogue:3334)
operation: findCellTypesInText
This operation accepts plain text and returns a list of cell types found in the text. Character offsets into the original submitted text string are provided for each cell type find.
operation: findTissueTypesInText
This operation searches the provided text string for mentions of tissue types. The tissue types are obtained from the Mouse adult gross anatomy ontology (http://purl.org/obo/owl/MA).
operation: findSynonymsInText
This operations accepts two inputs; a list of ids each with literal strings to be found, and a text string to be searched for the literal strings.
operation: findSynonymsInTexts
This operation accepts two inputs; a list of ids each with a set of associated literal strings, and a list of text strings to be searched for all of these literal strings.
Finding relationships service (BioCatalogue:3335)
operation: findMetaboliteLocalisationRelationships
This operation accepts a list of chemical, cell type and tissue type annotations. It then returns a list of relationships between these entities.
operation: findProteinInteractionRelationships
This operation accepts a list of protein entity annotations. It then returns a list of relationships between these entities.
operation: findProteinLocalisationRelationships
This operation accepts a list of protein, cell type and tissue type annotations. It then returns a list of relationships between these entities.
operation: findTermRelationships
This operation accepts a list of term annotations. It then returns a list of relationships between these entities.
All these Web services are available as Taverna workflows through MyExperiment portal, where example workflows are given: