corpus package

Submodules

corpus.event module

Created on 2021-07-26

@author: wf

class corpus.event.Event[source]

Bases: lodstorage.jsonable.JSONAble

base class for Event entities

Constructor

asWikiMarkup(series: str, templateParamLookup: dict) str[source]
Parameters
  • series (str) – the name of the series

  • templateParamLookup (dict) – the mapping of python attributes to Mediawiki template parameters to be used

Returns

my WikiMarkup

Return type

str

getLookupAcronym()[source]

get the lookup acronym of this event e.g. add year information

Returns

the acronym to be used for lookup operations

Return type

str

getRecord()[source]

get my dict elements that are defined in getSamples

Returns

fields of my __dict__ which are defined in getSamples

Return type

dict

mapFromDict(d: dict, maptuples)[source]

set my attributes from the given dict mapping with the given mapping (key->attr) tuples

Parameters
  • d (dict) – the dictionary to map

  • maptuples (list) – the list of tuples for mapping

class corpus.event.EventBaseManager(name, entityName, entityPluralName: str, listName: Optional[str] = None, clazz=None, sourceConfig: Optional[corpus.config.EventDataSourceConfig] = None, primaryKey: Optional[str] = None, config=None, handleInvalidListTypes=False, filterInvalidListTypes=False, debug=False, profile=True)[source]

Bases: lodstorage.entity.EntityManager

common entity Manager for ConferenceCorpus

Constructor

Parameters
  • name (string) – name of this eventManager

  • entityName (string) – entityType to be managed e.g. Country

  • entityPluralName (string) – plural of the the entityType e.g. Countries

  • config (StorageConfig) – the configuration to be used if None a default configuration will be used

  • handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered

  • filterInvalidListTypes (bool) – True if invalidListTypes should be deleted

  • debug (boolean) – override debug setting when default of config is used via config=None

  • profile (boolean) – True if profiling/timing information should be shown for long-running operations

asCsv(separator: str = ',', selectorCallback: Optional[Callable] = None)[source]

Converts the events to csv format :param separator: character separating the row values :type separator: str :param selectorCallback: callback functions returning events to be converted to csv. If None all events are converted.

Returns

csv string of events

configure()[source]

configure me - abstract method that needs to be overridden

fromCache(force: bool = False, getListOfDicts=None, append=False, sampleRecordCount=- 1)[source]

overwritten version of fromCache that calls postProcessEntityList

fromCsv(csvString, separator: str = ',', overwriteEvents: bool = True, updateEntitiesCallback: Optional[Callable] = None)[source]
Parameters
  • csvString – csvString having all the csv content

  • separator – the separator of the csv

  • append – to append to the self object.

  • updateEntitiesCallback

Returns: Nothing. The self object is upadated

getEventByKey(keyToSearch, keytype='pageTitle')[source]
getLoDfromEndpoint() list[source]

get my content from my endpoint

Returns

the list of dicts derived from the given SPARQL query

Return type

list

postProcessEntityList(debug: bool = False)[source]

postProcess my entities

postProcessLodRecords(listOfDicts: list, **kwArgs)[source]

post process the given list of Dicts with raw Events

Parameters

listOfDicts (list) – the list of raw Events to fix

rateAll(ratingManager: corpus.quality.rating.RatingManager)[source]

rate all events and series based on the given rating Manager

setAllAttr(listOfDicts, attr, value)[source]

set all attribute values of the given attr in the given list of Dict to the given value

updateFromLod(lod: list, overwriteEvents: bool = True, updateEntitiesCallback: Optional[Callable] = None, restrictToSamples: bool = True)[source]

Updates the entities from the given LoD. If a entity does not already exist a new one will be added. :param lod: data to update the entities :param overwriteEvents: If False only missing values are added :param updateEntitiesCallback: Callback function that is called on an updated entity :param restrictToSamples: If True only properties that are names in the samples are set. :type restrictToSamples: bool

Returns:

class corpus.event.EventManager(name: str, sourceConfig: Optional[corpus.config.EventDataSourceConfig] = None, clazz=None, primaryKey: Optional[str] = None, config: Optional[lodstorage.storageconfig.StorageConfig] = None, debug=False)[source]

Bases: corpus.event.EventBaseManager

Event entity list

constructor

static asWikiSon(eventDicts)[source]
static eventDictToWikiSon(eventDict)[source]
getEventsInSeries(seriesAcronym)[source]

Return all the events in a given series.

linkSeriesAndEvent(eventSeriesManager: corpus.event.EventSeriesManager, seriesKey: str = 'series')[source]

link Series and Event using the given foreignKey

Parameters
  • seriesKey (str) – the key to be use for lookup

  • eventSeriesManager (EventSeriesManager) –

class corpus.event.EventSeries[source]

Bases: lodstorage.jsonable.JSONAble

base class for Event Series entities

Constructor

asWikiMarkup() str[source]

convert me to wikimarkup

see https://github.com/WolfgangFahl/ConferenceCorpus/issues/10

class corpus.event.EventSeriesManager(name: str, sourceConfig: Optional[corpus.config.EventDataSourceConfig] = None, clazz=None, primaryKey: Optional[str] = None, config: Optional[lodstorage.storageconfig.StorageConfig] = None, debug=False)[source]

Bases: corpus.event.EventBaseManager

Event series list

constructor

class corpus.event.EventStorage[source]

Bases: object

common storage aspects of the EventManager and EventSeriesManager

classmethod asPlantUml(baseEntity='Event', exclude=None)[source]

return me as a plantUml Diagram markup

classmethod createLookup(column: str, tables: list)[source]

create a lookup for a column for the given list of tables

Parameters
  • column (str) – the column to create the lookup for

  • tables (str) – the names of the tables to take into account

classmethod createViews(exclude=None, show=False)[source]

create the general Event views

Parameters
  • exclude (list) – the list of table names to be excluded

  • show (bool) – if True show the DDL

classmethod getCommonViewDDLs(viewNames=['event', 'eventseries'], exclude=None)[source]

get the SQL DDL for a common view

Returns

the SQL DDL CREATE VIEW command

Return type

str

classmethod getDBFile(cacheFileName='EventCorpus')[source]

get the database file for the given cacheFileName

Parameters

cacheFileName (str) – the name of the cacheFile without suffix

classmethod getQueryManager(lang='sql', name='queries', debug=False)[source]

get the query manager for the given language and fileName

Parameters
  • lang (str) – the language of the queries to extract

  • name (str) – the name of the manager containing the query specifications

  • debug (bool) – if True set debugging on

classmethod getSignatureCache(profile: bool = True, force: bool = False)[source]

cache the signature Data in a separate SQLite DB

Parameters
  • profile (bool) – if True show profiling information

  • force (bool) – if True force the cache creation

classmethod getSqlDB()[source]

get the SQL Database

static getStorageConfig(debug: bool = False, mode='sql') lodstorage.storageconfig.StorageConfig[source]

get the storageConfiguration

Parameters
  • debug (bool) – if True show debug information

  • mode (str) – sql or json

Returns

the storage configuration to be used

Return type

StorageConfig

classmethod getTableList(withInstanceCount: bool = True) list[source]

get the list of SQL Tables involved

Returns

the map of SQL tables used for caching withInstanceCount(bool): if TRUE add the count of instances to the table Map

Return type

list

classmethod getViewTableList(viewName, exclude=None)[source]
profile = True
viewTableExcludes = {'event': ['event_acm', 'event_ceurws', 'event_orclonebackup', 'event_or', 'event_orbackup'], 'eventseries': ['eventseries_acm', 'eventseries_or', 'eventseries_orbackup', 'eventseries_orclonebackup', 'eventseries_gnd']}
withShowProgress = False

corpus.eventcorpus module

Created on 2021-04-16

@author: wf

class corpus.eventcorpus.EventCorpus(debug=False, verbose=False)[source]

Bases: object

Towards a gold standard event corpus and observatory …

Constructor

Parameters
  • debug (bool) – set debugging if True

  • verbose (bool) – set verbose output if True

addDataSource(eventDataSource: corpus.eventcorpus.EventDataSource)[source]

adds the given eventDataSource

Parameters

eventDataSource – EventDataSource

static download()[source]

download the EventCorpus.db if needed

loadAll(forceUpdate: bool = False, showProgress=False)[source]

load all eventDataSources

Parameters

forceUpdate (bool) – True if the data should be fetched from the source instead of the cache

class corpus.eventcorpus.EventDataSource(eventManager: corpus.event.EventManager, eventSeriesManager: corpus.event.EventSeriesManager, sourceConfig=<class 'corpus.config.EventDataSourceConfig'>)[source]

Bases: object

a data source for events

constructor

Parameters
  • sourceConfig (EventDataSourceConfig) – the configuration for the EventDataSource

  • eventManager (EventManager) – manager for the events

  • eventSeriesManager (EventSeriesManager) – manager for the eventSeries

load(forceUpdate=False, showProgress=False, debug=False)[source]

load this data source

Parameters
  • forceUpdate (bool) – if true force updating this datasource

  • showProgress (bool) – if true show the progress

  • debug (bool) – if true show debug information

rateAll(ratingManager: corpus.quality.rating.RatingManager)[source]

rate all events and series based on the given rating Manager

corpus.lookup module

Created on 2021-07-30

@author: wf

class corpus.lookup.CorpusLookup(lookupIds: Optional[list] = None, configure: Optional[callable] = None, debug=False)[source]

Bases: object

search and lookup for different EventCorpora

Constructor

Parameters
  • lookupIds (list) – the list of lookupIds to addDataSources for

  • configure (callable) – Callback to configure the corpus lookup

getDataSource(lookupId: str) corpus.eventcorpus.EventDataSource[source]

get the data source by the given lookupId

Parameters

lookupId (str) – the lookupId of the data source to get

Returns

the data source

Return type

EventDataSource

getDataSource4TableName(tableName: str) corpus.eventcorpus.EventDataSource[source]

get the data source by the given tableName

Parameters

tableName (str) – a tableName of the data source to get

Returns

the data source

Return type

EventDataSource

getDataSourceInfos(withInstanceCount: bool = True)[source]

get the dataSource Infos

getDictOfLod4MultiQuery(multiquery: str, idQuery: Optional[str] = None, omitFailed: bool = True) dict[source]
Parameters
  • multiquery (str) – the multi query containing a variable

  • idQuery (str) – optional query to get lists of ids for selection

  • omitFaild (bool) – if True omit failed queries if False raise Exception on failure

Returns

the dict of list of dicts for the queries derived from the multi query

Return type

dict

Raises

Exception – if omitFailed is False and an error occured for a query

getLod4Query(query: str, params=None)[source]
Parameters
  • query – the query to run

  • params (tuple) – the query params, if any

Returns

the list of dicts for the query

Return type

list

getMultiQueryVariable(multiquery: str, lenient: bool = False)[source]

get the variable being used in a multiquery

Parameters
  • multiquery (str) – the multiquery containing a {variable}

  • lenient (bool) – if True allow to return a None value otherwise raise an Exception if no variable was found

Returns

variable

Return type

str

Raises

Exception – if lenient is False and no variable was found

load(forceUpdate: bool = False, showProgress: bool = False, withCreateViews=True)[source]

load the event corpora

Parameters
  • forceUpdate (bool) – if True the data should be fetched from the source instead of the cache

  • showProgress (bool) – if True the progress of the loading should be shown

  • withCreateViews (bool) – if True recreate the common views

lookupIds = ['confref', 'crossref', 'dblp', 'gnd', 'tibkat', 'wikidata', 'wikicfp', 'or', 'or-backup', 'orclone', 'orclone-backup']
class corpus.lookup.CorpusLookupConfigure[source]

Bases: object

Configuration of the Corpus Lookup

static configureCorpusLookup(lookup, debug=False)[source]

callback to configure the corpus lookup

static getWikiTextPath(wikiId: str)[source]

get the WikiText (Backup) path for the given wikiId

Parameters

wikiId (str) – the wikiId (e.g. “or”, “orclone”

Returns

the path to the backup files as created by the wikibackup script

corpus.lookup.main(argv=None)[source]

main program.

Module contents