Welcome to ConferenceCorpus’s documentation!
corpus package
Submodules
corpus.event module
Created on 2021-07-26
@author: wf
- class corpus.event.Event[source]
Bases:
lodstorage.jsonable.JSONAble
base class for Event entities
Constructor
- asWikiMarkup(series: str, templateParamLookup: dict) str [source]
- Parameters
series (str) – the name of the series
templateParamLookup (dict) – the mapping of python attributes to Mediawiki template parameters to be used
- Returns
my WikiMarkup
- Return type
str
- getLookupAcronym()[source]
get the lookup acronym of this event e.g. add year information
- Returns
the acronym to be used for lookup operations
- Return type
str
- class corpus.event.EventBaseManager(name, entityName, entityPluralName: str, listName: Optional[str] = None, clazz=None, sourceConfig: Optional[corpus.config.EventDataSourceConfig] = None, primaryKey: Optional[str] = None, config=None, handleInvalidListTypes=False, filterInvalidListTypes=False, debug=False, profile=True)[source]
Bases:
lodstorage.entity.EntityManager
common entity Manager for ConferenceCorpus
Constructor
- Parameters
name (string) – name of this eventManager
entityName (string) – entityType to be managed e.g. Country
entityPluralName (string) – plural of the the entityType e.g. Countries
config (StorageConfig) – the configuration to be used if None a default configuration will be used
handleInvalidListTypes (bool) – True if invalidListTypes should be converted or filtered
filterInvalidListTypes (bool) – True if invalidListTypes should be deleted
debug (boolean) – override debug setting when default of config is used via config=None
profile (boolean) – True if profiling/timing information should be shown for long-running operations
- asCsv(separator: str = ',', selectorCallback: Optional[Callable] = None)[source]
Converts the events to csv format :param separator: character separating the row values :type separator: str :param selectorCallback: callback functions returning events to be converted to csv. If None all events are converted.
- Returns
csv string of events
- fromCache(force: bool = False, getListOfDicts=None, append=False, sampleRecordCount=- 1)[source]
overwritten version of fromCache that calls postProcessEntityList
- fromCsv(csvString, separator: str = ',', overwriteEvents: bool = True, updateEntitiesCallback: Optional[Callable] = None)[source]
- Parameters
csvString – csvString having all the csv content
separator – the separator of the csv
append – to append to the self object.
updateEntitiesCallback –
Returns: Nothing. The self object is upadated
- getLoDfromEndpoint() list [source]
get my content from my endpoint
- Returns
the list of dicts derived from the given SPARQL query
- Return type
list
- postProcessLodRecords(listOfDicts: list, **kwArgs)[source]
post process the given list of Dicts with raw Events
- Parameters
listOfDicts (list) – the list of raw Events to fix
- rateAll(ratingManager: corpus.quality.rating.RatingManager)[source]
rate all events and series based on the given rating Manager
- setAllAttr(listOfDicts, attr, value)[source]
set all attribute values of the given attr in the given list of Dict to the given value
- updateFromLod(lod: list, overwriteEvents: bool = True, updateEntitiesCallback: Optional[Callable] = None, restrictToSamples: bool = True)[source]
Updates the entities from the given LoD. If a entity does not already exist a new one will be added. :param lod: data to update the entities :param overwriteEvents: If False only missing values are added :param updateEntitiesCallback: Callback function that is called on an updated entity :param restrictToSamples: If True only properties that are names in the samples are set. :type restrictToSamples: bool
Returns:
- class corpus.event.EventManager(name: str, sourceConfig: Optional[corpus.config.EventDataSourceConfig] = None, clazz=None, primaryKey: Optional[str] = None, config: Optional[lodstorage.storageconfig.StorageConfig] = None, debug=False)[source]
Bases:
corpus.event.EventBaseManager
Event entity list
constructor
- linkSeriesAndEvent(eventSeriesManager: corpus.event.EventSeriesManager, seriesKey: str = 'series')[source]
link Series and Event using the given foreignKey
- Parameters
seriesKey (str) – the key to be use for lookup
eventSeriesManager (EventSeriesManager) –
- class corpus.event.EventSeries[source]
Bases:
lodstorage.jsonable.JSONAble
base class for Event Series entities
Constructor
- asWikiMarkup() str [source]
convert me to wikimarkup
see https://github.com/WolfgangFahl/ConferenceCorpus/issues/10
- class corpus.event.EventSeriesManager(name: str, sourceConfig: Optional[corpus.config.EventDataSourceConfig] = None, clazz=None, primaryKey: Optional[str] = None, config: Optional[lodstorage.storageconfig.StorageConfig] = None, debug=False)[source]
Bases:
corpus.event.EventBaseManager
Event series list
constructor
- class corpus.event.EventStorage[source]
Bases:
object
common storage aspects of the EventManager and EventSeriesManager
- classmethod asPlantUml(baseEntity='Event', exclude=None)[source]
return me as a plantUml Diagram markup
- classmethod createLookup(column: str, tables: list)[source]
create a lookup for a column for the given list of tables
- Parameters
column (str) – the column to create the lookup for
tables (str) – the names of the tables to take into account
- classmethod createViews(exclude=None, show=False)[source]
create the general Event views
- Parameters
exclude (list) – the list of table names to be excluded
show (bool) – if True show the DDL
- classmethod getCommonViewDDLs(viewNames=['event', 'eventseries'], exclude=None)[source]
get the SQL DDL for a common view
- Returns
the SQL DDL CREATE VIEW command
- Return type
str
- classmethod getDBFile(cacheFileName='EventCorpus')[source]
get the database file for the given cacheFileName
- Parameters
cacheFileName (str) – the name of the cacheFile without suffix
- classmethod getQueryManager(lang='sql', name='queries', debug=False)[source]
get the query manager for the given language and fileName
- Parameters
lang (str) – the language of the queries to extract
name (str) – the name of the manager containing the query specifications
debug (bool) – if True set debugging on
- classmethod getSignatureCache(profile: bool = True, force: bool = False)[source]
cache the signature Data in a separate SQLite DB
- Parameters
profile (bool) – if True show profiling information
force (bool) – if True force the cache creation
- static getStorageConfig(debug: bool = False, mode='sql') lodstorage.storageconfig.StorageConfig [source]
get the storageConfiguration
- Parameters
debug (bool) – if True show debug information
mode (str) – sql or json
- Returns
the storage configuration to be used
- Return type
StorageConfig
- classmethod getTableList(withInstanceCount: bool = True) list [source]
get the list of SQL Tables involved
- Returns
the map of SQL tables used for caching withInstanceCount(bool): if TRUE add the count of instances to the table Map
- Return type
list
- profile = True
- viewTableExcludes = {'event': ['event_acm', 'event_ceurws', 'event_orclonebackup', 'event_or', 'event_orbackup'], 'eventseries': ['eventseries_acm', 'eventseries_or', 'eventseries_orbackup', 'eventseries_orclonebackup', 'eventseries_gnd']}
- withShowProgress = False
corpus.eventcorpus module
Created on 2021-04-16
@author: wf
- class corpus.eventcorpus.EventCorpus(debug=False, verbose=False)[source]
Bases:
object
Towards a gold standard event corpus and observatory …
Constructor
- Parameters
debug (bool) – set debugging if True
verbose (bool) – set verbose output if True
- addDataSource(eventDataSource: corpus.eventcorpus.EventDataSource)[source]
adds the given eventDataSource
- Parameters
eventDataSource – EventDataSource
- class corpus.eventcorpus.EventDataSource(eventManager: corpus.event.EventManager, eventSeriesManager: corpus.event.EventSeriesManager, sourceConfig=<class 'corpus.config.EventDataSourceConfig'>)[source]
Bases:
object
a data source for events
constructor
- Parameters
sourceConfig (EventDataSourceConfig) – the configuration for the EventDataSource
eventManager (EventManager) – manager for the events
eventSeriesManager (EventSeriesManager) – manager for the eventSeries
corpus.lookup module
Created on 2021-07-30
@author: wf
- class corpus.lookup.CorpusLookup(lookupIds: Optional[list] = None, configure: Optional[callable] = None, debug=False)[source]
Bases:
object
search and lookup for different EventCorpora
Constructor
- Parameters
lookupIds (list) – the list of lookupIds to addDataSources for
configure (callable) – Callback to configure the corpus lookup
- getDataSource(lookupId: str) corpus.eventcorpus.EventDataSource [source]
get the data source by the given lookupId
- Parameters
lookupId (str) – the lookupId of the data source to get
- Returns
the data source
- Return type
- getDataSource4TableName(tableName: str) corpus.eventcorpus.EventDataSource [source]
get the data source by the given tableName
- Parameters
tableName (str) – a tableName of the data source to get
- Returns
the data source
- Return type
- getDictOfLod4MultiQuery(multiquery: str, idQuery: Optional[str] = None, omitFailed: bool = True) dict [source]
- Parameters
multiquery (str) – the multi query containing a variable
idQuery (str) – optional query to get lists of ids for selection
omitFaild (bool) – if True omit failed queries if False raise Exception on failure
- Returns
the dict of list of dicts for the queries derived from the multi query
- Return type
dict
- Raises
Exception – if omitFailed is False and an error occured for a query
- getLod4Query(query: str, params=None)[source]
- Parameters
query – the query to run
params (tuple) – the query params, if any
- Returns
the list of dicts for the query
- Return type
list
- getMultiQueryVariable(multiquery: str, lenient: bool = False)[source]
get the variable being used in a multiquery
- Parameters
multiquery (str) – the multiquery containing a {variable}
lenient (bool) – if True allow to return a None value otherwise raise an Exception if no variable was found
- Returns
variable
- Return type
str
- Raises
Exception – if lenient is False and no variable was found
- load(forceUpdate: bool = False, showProgress: bool = False, withCreateViews=True)[source]
load the event corpora
- Parameters
forceUpdate (bool) – if True the data should be fetched from the source instead of the cache
showProgress (bool) – if True the progress of the loading should be shown
withCreateViews (bool) – if True recreate the common views
- lookupIds = ['confref', 'crossref', 'dblp', 'gnd', 'tibkat', 'wikidata', 'wikicfp', 'or', 'or-backup', 'orclone', 'orclone-backup']
Module contents
datasources package
Submodules
datasources.confref module
datasources.crossref module
datasources.dblp module
datasources.dblpxml module
datasources.openresearch module
datasources.webscrape module
datasources.wikicfp module
datasources.wikicfpscrape module
datasources.wikidata module
Module contents
setup module
smw package
Submodules
smw.topic module
Module contents
tests package
Submodules
tests.datasourcetoolbox module
Created on 2021-07-29
@author: wf
- class tests.datasourcetoolbox.DataSourceTest(methodName='runTest')[source]
Bases:
unittest.case.TestCase
test for EventDataSources
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- checkDataSource(eventDataSource: corpus.eventcorpus.EventDataSource, expectedSeries: int, expectedEvents: int, eventSample: Optional[str] = None)[source]
check the given DataSource
- Parameters
eventDataSource (EventDataSource) – the event data source to check
- static getEventSeries(seriesAcronym: str)[source]
Returns the event series as dict of lod (records are categorized into the different data sources)
- Parameters
seriesAcronym – acronym of the series
- Returns
dict of lod
tests.testConfref module
Created on 2021-08-02
@author: wf
- class tests.testConfref.TestConfRef(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
test getting events from Confref http://portal.confref.org as a data source
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
tests.testCorpusLookup module
Created on 2021-07-26
@author: wf
- class tests.testCorpusLookup.TestCorpusLookup(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
test the event corpus
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- configureCorpusLookup(lookup: corpus.lookup.CorpusLookup)[source]
callback to configure the corpus lookup
tests.testCrossRef module
Created on 2021-08-02
@author: wf
- class tests.testCrossRef.TestCrossRef(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
test getting events from CrossRef https://www.crossref.org/ as a data source
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- testCrossref_DOI_Lookup()[source]
test crossref API access see https://github.com/WolfgangFahl/ProceedingsTitleParser/issues/28
- testFixUmlauts()[source]
workaround Umlaut issue see https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
tests.testDblpEvents module
Created on 28.07.2021
@author: wf
- class tests.testDblpEvents.TestDblpEvents(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
test the dblp data source
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- classmethod configureCorpusLookup(lookup: corpus.lookup.CorpusLookup)[source]
callback to configure the corpus lookup
tests.testDblpXml module
Created on 2021-01-25
@author: wf
- class tests.testDblpXml.TestDblp(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
test the dblp xml parser and pylodstorage extraction for it
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- mock = True
- setUp(debug: bool = False, profile: bool = True, **kwargs)[source]
setUp the test environment
especially the mocking parameter - if mock is False a multi-Gigabyte download might be activated
- testIssue5()[source]
https://github.com/WolfgangFahl/ConferenceCorpus/issues/5
dblp xml parser skips some proceedings titles
tests.testOpenResearch module
Created on 27.07.2021
@author: wf
- class tests.testOpenResearch.TestOREventManager(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
Tests OREventManager
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- test_configure()[source]
tests configuring OREventManager with different loading methods to retrieve the records from the source
- class tests.testOpenResearch.TestOREventSeriesManager(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
Tests OREventSeriesManager
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- test_configure()[source]
tests configuring OREventManager with different loading methods to retrieve the records from the source
- class tests.testOpenResearch.TestOpenResearch(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
test the access to OpenResearch
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- class tests.testOpenResearch.TestOrSMW(methodName='runTest')[source]
Bases:
tests.basetest.BaseTest
tests OrSWM
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
tests.testSMW module
tests.testStatistics module
Created on 2021-07-31
@author: wf
- class tests.testStatistics.TestStatistics(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
test statistics
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
tests.testWebScrape module
Created on 2021-07-31
@author: wf
- class tests.testWebScrape.TestWebScrape(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
test getting rdfA based triples from Webpages
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
tests.testWikiCfp module
Created on 2021-07-31
@author: wf
- class tests.testWikiCfp.TestWikiCFP(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
test WikiCFP data source
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
tests.testWikiCfpScrape module
Created on 2020-08-20
@author: wf
- class tests.testWikiCfpScrape.TestWikiCFP(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
test events from WikiCFP
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- handleError(ex)[source]
handle the given exception
- Parameters
ex (Exception) – the exception to handle
- printDelimiterCount(names)[source]
print the count of the most common used delimiters in the given name list
- testEventScraping()[source]
test scraping the given event
test “This item has been deleted” WikiCFP items
e.g. http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=3
tests.testWikidata module
Created on 27.07.2021
@author: wf
- class tests.testWikidata.TestWikiData(methodName='runTest')[source]
Bases:
tests.datasourcetoolbox.DataSourceTest
test wiki data access
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- testQueryManager()[source]
test named query usage see https://github.com/WolfgangFahl/ConferenceCorpus/issues/45