corpus package
Submodules
corpus.event module
Created on 2021-07-26
@author: wf
- class corpus.event.Event[source]
Bases:
lodstorage.jsonable.JSONAble
base class for Event entities
Constructor
- asWikiMarkup(series: str, templateParamLookup: dict) str [source]
- Parameters
series (str) – the name of the series
templateParamLookup (dict) – the mapping of python attributes to Mediawiki template parameters to be used
- Returns
my WikiMarkup
- Return type
str
- getLookupAcronym()[source]
get the lookup acronym of this event e.g. add year information
- Returns
the acronym to be used for lookup operations
- Return type
str
- class corpus.event.EventBaseManager(name, entityName, entityPluralName: str, listName: Optional[str] = None, clazz=None, sourceConfig: Optional[corpus.config.EventDataSourceConfig] = None, primaryKey: Optional[str] = None, config=None, handleInvalidListTypes=False, filterInvalidListTypes=False, debug=False, profile=True)[source]
Bases:
lodstorage.entity.EntityManager
common entity Manager for ConferenceCorpus
Constructor
- Parameters
name (string) – name of this eventManager
entityName (string) – entityType to be managed e.g. Country
entityPluralName (string) – plural of the the entityType e.g. Countries
config (StorageConfig) – the configuration to be used if None a default configuration will be used
handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered
filterInvalidListTypes (bool) – True if invalidListTypes should be deleted
debug (boolean) – override debug setting when default of config is used via config=None
profile (boolean) – True if profiling/timing information should be shown for long-running operations
- asCsv(separator: str = ',', selectorCallback: Optional[Callable] = None)[source]
Converts the events to csv format :param separator: character separating the row values :type separator: str :param selectorCallback: callback functions returning events to be converted to csv. If None all events are converted.
- Returns
csv string of events
- fromCache(force: bool = False, getListOfDicts=None, append=False, sampleRecordCount=- 1)[source]
overwritten version of fromCache that calls postProcessEntityList
- fromCsv(csvString, separator: str = ',', overwriteEvents: bool = True, updateEntitiesCallback: Optional[Callable] = None)[source]
- Parameters
csvString – csvString having all the csv content
separator – the separator of the csv
append – to append to the self object.
updateEntitiesCallback –
Returns: Nothing. The self object is upadated
- getLoDfromEndpoint() list [source]
get my content from my endpoint
- Returns
the list of dicts derived from the given SPARQL query
- Return type
list
- postProcessLodRecords(listOfDicts: list, **kwArgs)[source]
post process the given list of Dicts with raw Events
- Parameters
listOfDicts (list) – the list of raw Events to fix
- rateAll(ratingManager: corpus.quality.rating.RatingManager)[source]
rate all events and series based on the given rating Manager
- setAllAttr(listOfDicts, attr, value)[source]
set all attribute values of the given attr in the given list of Dict to the given value
- updateFromLod(lod: list, overwriteEvents: bool = True, updateEntitiesCallback: Optional[Callable] = None, restrictToSamples: bool = True)[source]
Updates the entities from the given LoD. If a entity does not already exist a new one will be added. :param lod: data to update the entities :param overwriteEvents: If False only missing values are added :param updateEntitiesCallback: Callback function that is called on an updated entity :param restrictToSamples: If True only properties that are names in the samples are set. :type restrictToSamples: bool
Returns:
- class corpus.event.EventManager(name: str, sourceConfig: Optional[corpus.config.EventDataSourceConfig] = None, clazz=None, primaryKey: Optional[str] = None, config: Optional[lodstorage.storageconfig.StorageConfig] = None, debug=False)[source]
Bases:
corpus.event.EventBaseManager
Event entity list
constructor
- linkSeriesAndEvent(eventSeriesManager: corpus.event.EventSeriesManager, seriesKey: str = 'series')[source]
link Series and Event using the given foreignKey
- Parameters
seriesKey (str) – the key to be use for lookup
eventSeriesManager (EventSeriesManager) –
- class corpus.event.EventSeries[source]
Bases:
lodstorage.jsonable.JSONAble
base class for Event Series entities
Constructor
- asWikiMarkup() str [source]
convert me to wikimarkup
see https://github.com/WolfgangFahl/ConferenceCorpus/issues/10
- class corpus.event.EventSeriesManager(name: str, sourceConfig: Optional[corpus.config.EventDataSourceConfig] = None, clazz=None, primaryKey: Optional[str] = None, config: Optional[lodstorage.storageconfig.StorageConfig] = None, debug=False)[source]
Bases:
corpus.event.EventBaseManager
Event series list
constructor
- class corpus.event.EventStorage[source]
Bases:
object
common storage aspects of the EventManager and EventSeriesManager
- classmethod asPlantUml(baseEntity='Event', exclude=None)[source]
return me as a plantUml Diagram markup
- classmethod createLookup(column: str, tables: list)[source]
create a lookup for a column for the given list of tables
- Parameters
column (str) – the column to create the lookup for
tables (str) – the names of the tables to take into account
- classmethod createViews(exclude=None, show=False)[source]
create the general Event views
- Parameters
exclude (list) – the list of table names to be excluded
show (bool) – if True show the DDL
- classmethod getCommonViewDDLs(viewNames=['event', 'eventseries'], exclude=None)[source]
get the SQL DDL for a common view
- Returns
the SQL DDL CREATE VIEW command
- Return type
str
- classmethod getDBFile(cacheFileName='EventCorpus')[source]
get the database file for the given cacheFileName
- Parameters
cacheFileName (str) – the name of the cacheFile without suffix
- classmethod getQueryManager(lang='sql', name='queries', debug=False)[source]
get the query manager for the given language and fileName
- Parameters
lang (str) – the language of the queries to extract
name (str) – the name of the manager containing the query specifications
debug (bool) – if True set debugging on
- classmethod getSignatureCache(profile: bool = True, force: bool = False)[source]
cache the signature Data in a separate SQLite DB
- Parameters
profile (bool) – if True show profiling information
force (bool) – if True force the cache creation
- static getStorageConfig(debug: bool = False, mode='sql') lodstorage.storageconfig.StorageConfig [source]
get the storageConfiguration
- Parameters
debug (bool) – if True show debug information
mode (str) – sql or json
- Returns
the storage configuration to be used
- Return type
StorageConfig
- classmethod getTableList(withInstanceCount: bool = True) list [source]
get the list of SQL Tables involved
- Returns
the map of SQL tables used for caching withInstanceCount(bool): if TRUE add the count of instances to the table Map
- Return type
list
- profile = True
- viewTableExcludes = {'event': ['event_acm', 'event_ceurws', 'event_orclonebackup', 'event_or', 'event_orbackup'], 'eventseries': ['eventseries_acm', 'eventseries_or', 'eventseries_orbackup', 'eventseries_orclonebackup', 'eventseries_gnd']}
- withShowProgress = False
corpus.eventcorpus module
Created on 2021-04-16
@author: wf
- class corpus.eventcorpus.EventCorpus(debug=False, verbose=False)[source]
Bases:
object
Towards a gold standard event corpus and observatory …
Constructor
- Parameters
debug (bool) – set debugging if True
verbose (bool) – set verbose output if True
- addDataSource(eventDataSource: corpus.eventcorpus.EventDataSource)[source]
adds the given eventDataSource
- Parameters
eventDataSource – EventDataSource
- class corpus.eventcorpus.EventDataSource(eventManager: corpus.event.EventManager, eventSeriesManager: corpus.event.EventSeriesManager, sourceConfig=<class 'corpus.config.EventDataSourceConfig'>)[source]
Bases:
object
a data source for events
constructor
- Parameters
sourceConfig (EventDataSourceConfig) – the configuration for the EventDataSource
eventManager (EventManager) – manager for the events
eventSeriesManager (EventSeriesManager) – manager for the eventSeries
corpus.lookup module
Created on 2021-07-30
@author: wf
- class corpus.lookup.CorpusLookup(lookupIds: Optional[list] = None, configure: Optional[callable] = None, debug=False)[source]
Bases:
object
search and lookup for different EventCorpora
Constructor
- Parameters
lookupIds (list) – the list of lookupIds to addDataSources for
configure (callable) – Callback to configure the corpus lookup
- getDataSource(lookupId: str) corpus.eventcorpus.EventDataSource [source]
get the data source by the given lookupId
- Parameters
lookupId (str) – the lookupId of the data source to get
- Returns
the data source
- Return type
- getDataSource4TableName(tableName: str) corpus.eventcorpus.EventDataSource [source]
get the data source by the given tableName
- Parameters
tableName (str) – a tableName of the data source to get
- Returns
the data source
- Return type
- getDictOfLod4MultiQuery(multiquery: str, idQuery: Optional[str] = None, omitFailed: bool = True) dict [source]
- Parameters
multiquery (str) – the multi query containing a variable
idQuery (str) – optional query to get lists of ids for selection
omitFaild (bool) – if True omit failed queries if False raise Exception on failure
- Returns
the dict of list of dicts for the queries derived from the multi query
- Return type
dict
- Raises
Exception – if omitFailed is False and an error occured for a query
- getLod4Query(query: str, params=None)[source]
- Parameters
query – the query to run
params (tuple) – the query params, if any
- Returns
the list of dicts for the query
- Return type
list
- getMultiQueryVariable(multiquery: str, lenient: bool = False)[source]
get the variable being used in a multiquery
- Parameters
multiquery (str) – the multiquery containing a {variable}
lenient (bool) – if True allow to return a None value otherwise raise an Exception if no variable was found
- Returns
variable
- Return type
str
- Raises
Exception – if lenient is False and no variable was found
- load(forceUpdate: bool = False, showProgress: bool = False, withCreateViews=True)[source]
load the event corpora
- Parameters
forceUpdate (bool) – if True the data should be fetched from the source instead of the cache
showProgress (bool) – if True the progress of the loading should be shown
withCreateViews (bool) – if True recreate the common views
- lookupIds = ['confref', 'crossref', 'dblp', 'gnd', 'tibkat', 'wikidata', 'wikicfp', 'or', 'or-backup', 'orclone', 'orclone-backup']