This post outlines the basic features of JsonPydexer at an early stage in the development cycle (alpha 0.3.2 as of this writing).

Usage

Installation

Download source files from Github or use pip.

pip install jsonpydexer

Indexing

JsonPydexer can be used to index a directory containing JSON files, each with a single entity, or a directory containing many subdirectories fulfilling the same criteria. To index based on one or more key names, give each key name as either a string (for key names on the root level of the JSON files) or as a list of strings (for key names that are nested).

For example, the key names for files in the Altmetric.com dataset (following this basic form):

{
    "citation": {
        "doi": "1000.1000",
        "pubdate": "2000-01-31T00:00:00+00:00"
    }
    "altmetric_id": "10/10000"
}

This Python code could be used (note: this may take a long time for large datasets):

from JsonPydexer import JsonPydexer

jp = JsonPydexer("/home/christian/data/altmetric_clean_sample")
jp.index([
    ["citation", "doi"],
    ["citation", "pubdate"],
    "altmetric_id"
])

Searching the index

To search the index based on your specified keys, similar syntax can be used:

from JsonPydexer import JsonPydexer

jp = JsonPydexer("/home/christian/data/altmetric_clean_sample")
files = jp.get_files(["citation", "pubdate"], "2011-01-27T00:00:00+00:00")
for f in files:
    print(f)

get_file[s] can be passed a key name and a search string. If the key is non-unique, get_files should be used, but if the key is unique, get_file should be used.

Some issues to note:

  • Adding new key names is not currently supported. You must delete the .jp.pkl file in the directory you’ve indexed and run the indexer again with your full list of key names
  • If the files are changed, the indexer will not reflect these changes if run again. However, new files and deleted files will be reflected.
  • This software is in alpha. There may be other unexpected errors. Feel free to report them in an issue on Github or submit a pull request if you fix it yourself.
  • Additionally, feature requests are welcome on Github as well.