ConllRegExp¶

Inherits from ConllRelator and ConllSelector. It provides a very expressive RegExp module that allows to query complex patterns in the Conll Tree/Graph.

You should use this module if you want to match patterns into a sentence. Examples:

Get all the tokens that begin with a consonant
Get all the Parts-of-Speech of tokens that begin with a consonant
Get all the tokens with a prepositional child and a parent that is a verb
Get all the tokens with a child that is an adverb and begins with ‘fa’

RegExp can be as complex as you wish…

Get all the tokens that do not contains numbers and have a child that is an adjective and a parent that is a verb that begins with an f
Get all the tokens that do not contains numbers and have a child that is an adjective and this child itself has another child that is a conjunction that starts with a and has 3 letters

To learn all about RegExp in Conll trees read ConllRegExp

Parameters¶

# user can set the column indexes if format is not Conll
ConllRegExp( IDX_ID=0, IDX_FORM=1, IDX_LEMMA=2, 
             IDX_UPOS=3, IDX_XPOS=4, IDX_MORPH=5, 
             IDX_HEAD=6, IDX_DEPREL=7, IDX_GRAPH=8 )  

Methods¶

ConllRegExp.set_sentence( sentence  )  
> sentence (object or list(list(str))): contains the Conll

ConllRegExp.set_target( target_index )  
> target_index (int): index of the target starting from zero

resp = ConllRegExp.regexp( dict_pattern )  
> dict_pattern (dict): nested dictionary encoding the regexp to match
>                   keys must be either relations or features
>                   values are standard regular expressions
>  Example: 
> # find all the prepositions starting with a
> # that have a parent that is a verb and is the root
> dict_pattern = { R.TOKEN:  { F.UPOS: 'ADP',  F.LEMMA:  '^a.*' },
>                  R.PARENT: { F.XPOS: '^V.*', F.DEPREL: 'root' } }
>
> resp (list(int)): list of the indexes of tokens matching the regexp

resp = ConllRegExp.find( id_ , relation )  
> id_ (int): index of the token we want to select
> relation (RELATION): relation or tuple of relations to find
> resp (list(int)): list of the indexes of tokens found

resp = ConllRegExp.select( id_ , feature )  
> id_ (int): index of the token we want to select
> feature (FEATURE): name of the feature or tuple of features to get
> resp (str): string containing the values of the selected features

path = ConllRegExp.get_path_to_target( id_ )
> id_ (int): index of the token were we start the path to target  
> path (list(int)): list of ids of tokens in the path to target

path = ConllRegExp.get_path_to_root( id_ )
> id_ (int):  index of the token were we start the path to root  
> path (list(int)): list of ids of tokens in the path to root

A ConllSentence¶

ID	FORM	LEMMA	UPOS	XPOS	MORPH	HEAD	DEPREL	GRAPH
1	While	while	SCONJ	IN	_	2	mark	_
2	working	work	VERB	VBG	VerbForm=Ger	11	advcl	_
3	at	at	ADP	IN	_	5	case	_
4	a	a	DET	DT	Definite=Ind\|PronType=Art	5	det	_
5	supermarket	supermarket	NOUN	NN	Number=Sing	2	obl	_
6	as	as	ADP	IN	_	8	case	_
7	a	a	DET	DT	Definite=Ind\|PronType=Art	8	det	_
8	bagger	bagger	NOUN	NN	Number=Sing	2	obl	_
9	,	,	PUNCT	,	_	11	punct	_
10	he	he	PRON	PRP	Case=Nom\|Gender=Masc\|Number=Sing\|Person=3\|PronType=Prs	11	nsubj	2:nsubj
11	released	release	VERB	VBD	Mood=Ind\|Tense=Past\|VerbForm=Fin	0	root	_
12	music	music	NOUN	NN	Number=Sing	11	obj	_
13	as	as	ADP	IN	_	16	case	_
14	an	a	DET	DT	Definite=Ind\|PronType=Art	16	det	_
15	independent	independent	ADJ	JJ	Degree=Pos	16	amod	_
16	artist	artist	NOUN	NN	Number=Sing	11	obl	_

Usage¶

from ConllQuery import ConllRegExp
from ConllQuery.enumerations import FEATURES as F
from ConllQuery.enumerations import RELATION as R

# instantiation
parser = ConllRegExp()

# set the sentence object that will be processed
parser.set_sentence( sentence ) 

# find all the tokens that are prepositions or nouns
parser.regexp( { R.TOKEN: { F.UPOS: 'ADP|NOUN' }} ) 
> [2,4,5,7,11,12,15]  # indexes start on zero

# find all the tokens that are verbs and roots
parser.regexp( { R.TOKEN: { F.UPOS: 'VERB', 
                            F.DEPREL: 'root' }} ) 
> [10]  # there is only one (released)

# Set the target to while
parser.set_target( 0 )

# the tokens that pass through the root while going to the target 
parser.regexp( { R.TO_TARGET{ F.DEPREL: 'root' } } )
> [9,10,11,12,13,14,15] # ids of tokens that pass through root

# you can put constrains on several relations
# example: find the nouns that have a parent root that is a verb
parser.regexp( {R.TOKEN:  { F.UPOS: 'NOUN'},
                R.PARENT: { F.UPOS: 'VERB', 
                            F.DEPREL: 'root'}} ) 
> [11, 15]  # retrieves both 'music' and 'artist' 

# you can also negate some features
# example: find the nouns that do not have an adjective child
parser.regexp(  {R.TOKEN: { F.UPOS: 'NOUN'}, 
                 R.CHILD: { F.UPOS: '^(?!.*ADJ).*$'}} ) 
> [4,7,11]  # doesn't find 'artist' (independent artist)

ConllRegExp¶

Parameters¶

Methods¶

A ConllSentence¶

Usage¶

ConllQuery

Navigation

Related Topics