public class BasicInformationExtractionSolver extends Object implements InformationExtractionSolver
Modifier and Type | Field and Description |
---|---|
static String |
RELATIONMENTION_UNSPECIFIED_RELATION_TYPE
The String value used in a RelationMention to indicate a
non-specific relation type.
|
Constructor and Description |
---|
BasicInformationExtractionSolver()
Default constructor for an BasicInformationExtractionSolver.
|
Modifier and Type | Method and Description |
---|---|
void |
applyOverrideSettings()
Apply the override settings.
|
protected InformationExtractionOutcomeMarking |
createMarking(InformationExtractionTokenList tl,
InformationExtractionEntityType entType,
int markingStatus)
Create a marking.
|
protected void |
createMarkingIfIdentified(InformationExtractionTokenList tl,
List<InformationExtractionOutcomeMarking> markingsList,
HashMap<Integer,InformationExtractionTokenList> tokenMap)
Create a marking for the specified TokenList if an EntityType can be
identified for it, and if so, add it to the specified list.
|
protected InformationExtractionOutcomeMarkingRelation |
createMarkingRelation(InformationExtractionOutcomeMarking source,
InformationExtractionOutcomeMarking target,
InformationExtractionEntityRelationType entRelType)
Creates a InformationExtractionOutcomeMarkingRelation.
|
protected InformationExtractionOutcomeMarkingRelation |
createMarkingRelation(InformationExtractionOutcomeMarking source,
InformationExtractionOutcomeMarking target,
String typeName)
Creates a InformationExtractionOutcomeMarkingRelation.
|
protected void |
createOutcome(InformationExtractionOutcomeMarking[] markings,
InformationExtractionOutcomeMarkingRelation[] markingRels)
Create the solved outcome.
|
protected InformationExtractionTokenList |
createTokenList(InformationExtraction ext,
List<InformationExtractionToken> listOfTokens)
Create a token list.
|
protected List<InformationExtractionTokenList> |
createTokenListsFromClassifierHits()
Create TokenLists for each idenitified classifier hit.
|
protected List<InformationExtractionTokenList> |
createTokenListsFromNerHits()
Create TokenLists for each idenitified NER (REGEX or classifier) hit.
|
protected List<InformationExtractionOutcomeMarkingRelation> |
determineMarkingRelations(List<InformationExtractionOutcomeMarking> markingsList,
String relStrat)
Determines the marking relations for a set of markings
and the specified relation strategy.
|
protected void |
determineMarkingRelationsForImplicitStrategy(List<InformationExtractionOutcomeMarking> markingsList,
List<InformationExtractionOutcomeMarkingRelation> markingRelsList)
Determines the marking relations for a set of markings, using
relation strategy "implicit".
|
protected void |
determineMarkingRelationsForRelationMentionStrategy(List<InformationExtractionOutcomeMarking> markingsList,
List<InformationExtractionOutcomeMarkingRelation> markingRelsList)
Determines the marking relations for a set of markings, using
relation strategy "relationmention".
|
protected List<InformationExtractionOutcomeMarking> |
determineMarkings(String markingStrat)
Determine the markings based on the specified marking solve strategy.
|
protected InformationExtractionEntityType |
determineMarkingType(InformationExtractionTokenList tl,
String text)
Determines the marking type for the specified TokenList.
|
protected InformationExtractionEntityType |
determineMarkingTypeFromPhraseSearch(LibrarySession sess,
String text)
Determines the marking type based on the specified text,
using a search for a match with the index on IEEntityPhrase.PHRASE.
|
protected InformationExtractionEntityType |
determineMarkingTypeViaClassifierNer(InformationExtractionTokenList tl)
Determines the marking type based on the specified TokenList,
using Classifier NER info available on the Tokens.
|
protected InformationExtractionEntityType |
determineMarkingTypeViaNer(InformationExtractionTokenList tl)
Determines the marking type based on the specified TokenList,
using standrd NER info available on the Tokens.
|
protected InformationExtractionEntityType |
determineMarkingTypeViaNerOrRegex(InformationExtractionTokenList tl,
boolean ignoreStandardNer)
Determines the marking type based on the specified TokenList,
using NER info available on the Tokens.
|
protected InformationExtractionEntityType |
determineMarkingTypeViaRegex(InformationExtractionTokenList tl)
Determines the marking type based on the specified TokenList,
using NER info available on the Tokens.
|
protected InformationExtractionEntityType |
determineMarkingTypeViaStopWord(String text)
Determines the marking type based on stop words based on the text.
|
Map<String,InformationExtractionEntityRelationType> |
getEntityRelationTypeMap()
Gets the EntityRelationType map, keyed by the uppercased
EntityRelationType name.
|
Map<String,InformationExtractionEntityType> |
getEntityTypeMap()
Gets the EntityType map, keyed by the uppercased
EntityType name.
|
protected String |
getFirstClassifierEntityType(InformationExtractionToken token)
Gets the first entity type name listed in the specified token.
|
InformationExtraction |
getInformationExtraction()
Gets the InformationExtraction being processed.
|
AttributeValueTable |
getMappedMarkingTypes()
Gets the mapped marking types table.
|
protected String |
getNerEntityType(InformationExtractionToken token)
Gets the NER entity type name listed in the specified token.
|
Map<String,String> |
getOmittedMarkingTypes()
Gets the stop word map, keyed by the uppercased stop word.
|
protected String |
getRegexEntityType(InformationExtractionToken token)
Gets the REGEX entity type name listed in the specified token.
|
Map<String,String> |
getRegexMappingsMap()
Gets the RegexMappings map, keyed by the uppercased
NER result tag, and value is the mapped EntityType name.
|
AttributeValueTable |
getSettings()
Gets the effective settings for the current solve.
|
Map<String,String> |
getStopWordMap()
Gets the stop word map, keyed by the uppercased stop word.
|
void |
initialize(InformationExtractionSolverSpecification spec)
Initialize this instance.
|
protected boolean |
isBuiltInNerEntity(String nerResult)
Gets the REGEX entity type name listed in the specified token.
|
protected String |
nameOf(InformationExtractionEntityType entType)
Returns name of an InformationExtractionEntityType, or an indication that
the specified value is null.
|
protected InformationExtractionEntityType |
performPhraseSearch(LibrarySession sess,
String term)
Perform a search for a search term, using the index on IEEntityPhrase.PHRASE.
|
protected void |
setInformationExtraction(InformationExtraction ext)
Sets the InformationExtraction being processed.
|
void |
solve(InformationExtraction ext)
Solve a parsed InformationExtraction, producing a solved
InformationExtractionOutcome.
|
public static final String RELATIONMENTION_UNSPECIFIED_RELATION_TYPE
public BasicInformationExtractionSolver()
public void initialize(InformationExtractionSolverSpecification spec) throws IfsException
Called immediately after construction (via the default constructor) and used so that the implementation can initialize session-independent state. This instance may be subsequently used concurrently by multiple threads and sessions. A session can be retrieved from the specification object, but it must not be cached. The specification also has the implementation and instance specific parameters which should be cached in a session independent way.
initialize
in interface InformationExtractionSolver
spec
- The specification for a given implementation.IfsException
- if the operation fails.public AttributeValueTable getSettings()
public InformationExtraction getInformationExtraction()
public Map<String,InformationExtractionEntityType> getEntityTypeMap()
public Map<String,InformationExtractionEntityRelationType> getEntityRelationTypeMap()
public Map<String,String> getRegexMappingsMap()
public Map<String,String> getStopWordMap()
public Map<String,String> getOmittedMarkingTypes()
public AttributeValueTable getMappedMarkingTypes()
The keys are the entity type, and values are the String names of the mapped entity type.
protected void setInformationExtraction(InformationExtraction ext) throws IfsException
ext
- the InformationExtraction being processedIfsException
- if the operation failspublic void applyOverrideSettings() throws IfsException
IfsException
- if the operation failspublic void solve(InformationExtraction ext) throws IfsException
solve
in interface InformationExtractionSolver
ext
- the target InformationExtractionIfsException
- if the operation failsprotected List<InformationExtractionOutcomeMarking> determineMarkings(String markingStrat) throws IfsException
Currently four strategies are supported:
markingStrat
- the marking solve strategyIfsException
- if the operation failsprotected List<InformationExtractionOutcomeMarkingRelation> determineMarkingRelations(List<InformationExtractionOutcomeMarking> markingsList, String relStrat) throws IfsException
Currently one strategy is supported:
markingsList
- the markingsrelStrat
- the Relation marking strategyIfsException
- if the operation failsprotected void determineMarkingRelationsForRelationMentionStrategy(List<InformationExtractionOutcomeMarking> markingsList, List<InformationExtractionOutcomeMarkingRelation> markingRelsList) throws IfsException
markingsList
- the markingsmarkingRelsList
- the list of marking relations (to fill)IfsException
- if the operation failsprotected void determineMarkingRelationsForImplicitStrategy(List<InformationExtractionOutcomeMarking> markingsList, List<InformationExtractionOutcomeMarkingRelation> markingRelsList) throws IfsException
markingsList
- the markingsmarkingRelsList
- the list of marking relations (to fill)IfsException
- if the operation failsprotected void createMarkingIfIdentified(InformationExtractionTokenList tl, List<InformationExtractionOutcomeMarking> markingsList, HashMap<Integer,InformationExtractionTokenList> tokenMap) throws IfsException
tl
- the TokenList of the potential markingmarkingsList
- the markings list to add the newly created marking if
an EntityType can be identifiedtokenMap
- the map between token position and TokenListIfsException
- if the operation failsprotected List<InformationExtractionTokenList> createTokenListsFromClassifierHits() throws IfsException
IfsException
- if the operation failsprotected List<InformationExtractionTokenList> createTokenListsFromNerHits() throws IfsException
IfsException
- if the operation failsprotected String getFirstClassifierEntityType(InformationExtractionToken token) throws IfsException
token
- the tokenIfsException
- if the operation failsprotected String getNerEntityType(InformationExtractionToken token) throws IfsException
token
- the tokenIfsException
- if the operation failsprotected String getRegexEntityType(InformationExtractionToken token) throws IfsException
token
- the tokenIfsException
- if the operation failsprotected boolean isBuiltInNerEntity(String nerResult) throws IfsException
token
- the tokenIfsException
- if the operation failsprotected InformationExtractionTokenList createTokenList(InformationExtraction ext, List<InformationExtractionToken> listOfTokens) throws IfsException
ext
- the InformationExtractionlistOfTokens
- the list of tokens comprising the new tokenlistIfsException
- if the operation failsprotected void createOutcome(InformationExtractionOutcomeMarking[] markings, InformationExtractionOutcomeMarkingRelation[] markingRels) throws IfsException
markings
- the markingsmarkingRels
- the marking relationsIfsException
- if the operation failsprotected InformationExtractionOutcomeMarking createMarking(InformationExtractionTokenList tl, InformationExtractionEntityType entType, int markingStatus) throws IfsException
returns null if the specified Entity type is to be filtered out, based on the optional solver property ECM.SOLVER.MappedMarkingTypes.
tl
- the TokenList for the markingentType
- the marking entity typemarkingStatus
- the status (source) of the markingIfsException
- if the operation failsprotected InformationExtractionEntityType determineMarkingType(InformationExtractionTokenList tl, String text) throws IfsException
Return null if no suitable match can be determined.
tl
- the TokenListtext
- the lowercased text representing the TokenListIfsException
- if the operation failsprotected InformationExtractionEntityType determineMarkingTypeViaStopWord(String text) throws IfsException
text
- the candidate marking's textIfsException
- if the operation failsprotected InformationExtractionEntityType determineMarkingTypeViaNer(InformationExtractionTokenList tl) throws IfsException
tl
- the TokenListIfsException
- if the operation failsprotected InformationExtractionEntityType determineMarkingTypeViaRegex(InformationExtractionTokenList tl) throws IfsException
tl
- the TokenListIfsException
- if the operation failsprotected InformationExtractionEntityType determineMarkingTypeViaNerOrRegex(InformationExtractionTokenList tl, boolean ignoreStandardNer) throws IfsException
Ignore standard NER if indicated.
tl
- the TokenListignoreStandardNer
- whether to ignore standard NERIfsException
- if the operation failsprotected InformationExtractionEntityType determineMarkingTypeViaClassifierNer(InformationExtractionTokenList tl) throws IfsException
tl
- the TokenListIfsException
- if the operation failsprotected InformationExtractionEntityType determineMarkingTypeFromPhraseSearch(LibrarySession sess, String text) throws IfsException
sess
- the session contexttext
- the text to search forIfsException
- if the operation failsprotected InformationExtractionEntityType performPhraseSearch(LibrarySession sess, String term) throws IfsException
sess
- the session contextterm
- the search term to use in the CONTAINS clauseIfsException
- if the operation failsprotected InformationExtractionOutcomeMarkingRelation createMarkingRelation(InformationExtractionOutcomeMarking source, InformationExtractionOutcomeMarking target, String typeName) throws IfsException
source
- the source markingtarget
- the target markingtypeName
- the relation type nameIfsException
- if the operation failsprotected InformationExtractionOutcomeMarkingRelation createMarkingRelation(InformationExtractionOutcomeMarking source, InformationExtractionOutcomeMarking target, InformationExtractionEntityRelationType entRelType) throws IfsException
source
- the source markingtarget
- the target markingentRelType
- the InformationExtractionEntityRelationType to useIfsException
- if the operation failsprotected String nameOf(InformationExtractionEntityType entType) throws IfsException
entType
- the InformationExtractionEntityTypeIfsException
- if the operation failsCopyright © 2023. All rights reserved.