Enumerations

Are constants used by the query language almost everywhere. There are: Features, Relations, Sorts, Limiters and Encoders Everyone plays an important and very different role in the query system

Features

A feature is an atomic piece of information we can easily get for a give token. Think of it as the SELECT ‘my_feature’ used in SQL queries

from ConllQuery.enumerations import FEATURES as F

# List all existing/supported features
for f in F:
    print( f )
> ID                 
> FORM               
> LEMMA              
> UPOS               
> XPOS               
> MORPH              
> HEAD               
> DEPREL             
> GRAPH                              
> MORPH_CASE      
> MORPH_DEFINITE  
> MORPH_DEGREE    
> MORPH_FOREIGN   
> MORPH_GENDER    
> MORPH_MOOD      
> MORPH_NUMBER    
> MORPH_NUMTYPE   
> MORPH_PERSON    
> MORPH_POSS      
> MORPH_PRONTYPE  
> MORPH_REFLEX    
> MORPH_TENSE     
> MORPH_VERBFORM  
> MORPH_VOICE     
> MORPH_POLARITY              
> PATH_TO_ROOT    
> PATH_TO_TARGET                    
> DIRECTION_TO_ROOT   
> DIRECTION_TO_TARGET 
> MORPH_DATE     
> MORPH_NE       
> MORPH_NOUNTYPE 
> MORPH_PC       

Relations

A relation is an function that converts a token id into a list of token ids. Any relation is possible. However interesting relations must have some syntax motivation. It is somewhat alike to the WHERE relation used in SQL queries.

from ConllQuery.enumerations import RELATION as R

# List all existing/supported relations
for r in R:
    print( r )
> TOKEN       # The token itself
> CHILD       # The child/children of the token
> PARENT      # The parent(s) of the token
> BROTHER     # The brother(s) of the token
> GRANDSON    # The grandson(s) of the token
> GRANDPARENT # The grandparents(s) of the token
> ANCESTORS   # Everyone in the path from the token to the root 
> SUCCESSORS  # Everyone that passes through the token going to root
> UNCLES      # The uncle(s) of the token
> NEPHEWS     # The nephew(s) of the token
> DIRECT_LINE # Ancestors + Successors           
> ROOT        # The token's root
> TO_ROOT     # Everyone in the path from the token to the root  
> TARGET      # The token's target
> TO_TARGET   # Everyone in the path from the token to target   
> SOMEWHERE   # Everyone in the sentence
> NEIGHBOR    # All the words except itself
> NEIGHBOR_0  # Closest neighbors, word before, word after
> NEIGHBOR_1  # neighbors at distance 1 
> NEIGHBOR_2  # neighbors at distance 2 
> NEIGHBOR_3  # neighbors at distance 3 
> NEIGHBOR_4  # neighbors at distance 4 
> NEIGHBOR_5  # neighbors at distance 5 

Sorts

the order in which the selected items will be returned to the user

from ConllQuery.enumerations import SORT as S

# List all existing/supported relations
for s in S:
    print( s )
> NONE                   # No sort
> AS_PATH_TO_ROOT        # Following the direction from id_ to root
> AS_PATH_TO_TARGET      # Following the direction from id_ to target
> BY_INDEX               # sort by token index, aka. from left to right
> BY_REVERSE_INDEX       # sort by inverse index, aka. from right to left
> ALPHABETICALLY         # sort alphabetically w.r.t the selected feature
> REVERSE_ALPHABETICALLY # sort reverse alphabetically 

Limiters

Puts a limit in the number of items returned to the user. If it is an integer, then it limits to that number. Otherwise, it uses one of the following

from ConllQuery.enumerations import LIMITER as L

# List all existing/supported relations
for l in L:
    print( l )
> NONE       = 0  # No limit
> AT_LEFT    = 1  # Limit to words at the left of id_
> AT_RIGHT   = 2  # Limit to words at the right of id_

Encoder

Defines the possible encodings of the response. by default a RAW encoding is a list of found elements however, you may want to apply some SQL alike operations on the response, such as COUNT or other parsing operators

from ConllQuery.enumerations import ENCODER as E

# List all existing/supported relations
for e in E:
    print( e )
> RAW         = 0  # No encoding
> CONCATENATE = 1  # Concatenate answers
> NUMERIC     = 2  # Cast to float
> COUNT       = 3  # Count how many
> LOWERCASE   = 4  # Put everything in lower-case
> UPPERCASE   = 5  # Put everything in upper-case
> LENGTH      = 6  # get the length of each item in response
> AS_ASCII    = 7  # cast to ASCII using unidecode
> CAPSENC     = 8  # perform caps encoding. YoLo   -> AaAa
> FORMENC     = 9  # perform form encoding. YoLo42 -> AaAa00
> HAS_DIGIT   = 10 # transform elements into ['ALL','SOME','NONE' ] 
> HAS_PUNCT   = 11 # transform elements into ['ALL','SOME','NONE' ] 
> HAS_UPPER   = 12 # transform elements into ['ALL','SOME','NONE' ] 
> HAS_LOWER   = 13 # transform elements into ['ALL','SOME','NONE' ] 
> HAS_UTF8    = 14 # transform elements into ['ALL','SOME','NONE' ] 
> PREFIX_1    = 20 # Get the 1 first character of each response
> PREFIX_2    = 21 # Get the 2 first character of each response
> PREFIX_3    = 22 # Get the 3 first character of each response
> PREFIX_4    = 23 # Get the 4 first character of each response
> PREFIX_5    = 24 # Get the 5 first character of each response
> SUFFIX_1    = 30 # Get the 1 last character of each response
> SUFFIX_2    = 31 # Get the 2 last character of each response
> SUFFIX_3    = 32 # Get the 3 last character of each response
> SUFFIX_4    = 33 # Get the 4 last character of each response
> SUFFIX_5    = 34 # Get the 5 last character of each response