logo
down
shadow

Parse parts of sentence while preserving context in Python with regex


Parse parts of sentence while preserving context in Python with regex

By : chrisky
Date : November 17 2020, 11:52 AM
around this issue You could try using pyparsing which is great library for working with grammars.
If the entries have consistent (boolean) logic and you know how to interpret the comma between and's and or's than you could try parsing the entries using a script based on the simpleBool.py example from pyparsing:
code :
import pprint
import string

from pyparsing import Word, nums, Literal, opAssoc, operatorPrecedence


course_name = Word(string.ascii_uppercase + nums + "/") | Literal("instructor permission")
comma_separator = Literal(',')
comma_separator.setParseAction(lambda t:"&&")

and_separator = Literal(', and') | Literal(', AND') | Literal('and')  | Literal('AND')
and_separator.setParseAction(lambda t:"&&")

or_separator = Literal('or') | Literal("OR")
or_separator.setParseAction(lambda t:"||")

course_line = operatorPrecedence(course_name,
                            [
                                (and_separator, 2, opAssoc.LEFT,),
                                (or_separator, 2, opAssoc.LEFT),
                                (comma_separator, 2, opAssoc.LEFT,),
                            ])

data = """AAAA111, BBB111, CCC101, and DDD104
AAAA111, BBB111, CCC101 or DDD104
AAAA111, AAAA112 or AAAA113, BBB333
AAA111 or BBB111, AND CCC111
AAA111 or BBB111 or CCC111 or DDD111
AAA111 or 112 or 222 or 333
AAA111 or instructor permission
AAA111/221
"""

for line in data.splitlines():
    results = course_line.parseString(line)
    print(line)
    pprint.pprint(results.asList()[0])
    print()
AAAA111, BBB111, CCC101, and DDD104
['AAAA111', '&&', 'BBB111', '&&', ['CCC101', '&&', 'DDD104']]

AAAA111, BBB111, CCC101 or DDD104
['AAAA111', '&&', 'BBB111', '&&', ['CCC101', '||', 'DDD104']]

AAAA111, AAAA112 or AAAA113, BBB333
['AAAA111', '&&', ['AAAA112', '||', 'AAAA113'], '&&', 'BBB333']

AAA111 or BBB111, AND CCC111
['AAA111', '||', ['BBB111', '&&', 'CCC111']]

AAA111 or BBB111 or CCC111 or DDD111
['AAA111', '||', 'BBB111', '||', 'CCC111', '||', 'DDD111']

AAA111 or 112 or 222 or 333
['AAA111', '||', '112', '||', '222', '||', '333']

AAA111 or instructor permission
['AAA111', '||', 'instructor permission']

AAA111/221
'AAA111/221'


Share : facebook icon twitter icon
Regex.Split() sentence to words while preserving whitespace

Regex.Split() sentence to words while preserving whitespace


By : JayB901662
Date : March 29 2020, 07:55 AM
wish of those help I'm using Regex.Split() to take the user input and turn it into individual words in a list but at the moment it removes any spaces they add, I would like it to keep the whitespace.
code :
string text = "This            is some text";
var splits = Regex.Split(text, @"(?=(?<=[^\s])\s+)");

foreach (string item  in splits)
    Console.Write(item);
Console.WriteLine(splits.Count());
(?=\s+)
(?=(?<=[^\s])\s+)
(?=(?<=^|[^\s])\s+)
Taking parts of a sentence in String using Regex

Taking parts of a sentence in String using Regex


By : Bill
Date : March 29 2020, 07:55 AM
I wish did fix the issue. I have this sentences which I want to manipulate and take its information: , To match parts similar to 10,240.0MB you can use
code :
\b\d{1,3}(?:,\d{3})*[.]\d[KMGT]B\b
\b\d{2}:\d{2}:\d{2} \d{2}-\d{2}-\d{4}\b
String regex = "\\b\\d{1,3}(?:,\\d{3})*[.]\\d[KMGT]B\\b"
        + "|\\b\\d{2}:\\d{2}:\\d{2} \\d{2}-\\d{2}-\\d{4}\\b";
Regex to match sentences with jumbled words but preserving sentence order

Regex to match sentences with jumbled words but preserving sentence order


By : Louis Kemp
Date : March 29 2020, 07:55 AM
will help you I couldn't achieve with a single regex, instead I did the following:
finding xml parts in a sentence with regex

finding xml parts in a sentence with regex


By : Peng Hu
Date : March 29 2020, 07:55 AM
This might help you Assuming that the xml tags are always open-closed, this might do what you want. It would remain for you to put the xml's in.
code :
>>> line = '''<bpt i="1" type="1" x="1" />und ZF-Getriebe <ept i="1" />TipMatic <ph x="2" type="2" />Lite (&lt;/cf&gt;6AS850, 6AS800, 6AS1000)'''
>>> import re
>>> pieces = []
>>> pos = 0
>>> for m in re.finditer(r'(<[^\/]+\/>)', line):
...     line[m.span()[0]:m.span()[1]]
...     pieces.append(line[pos:m.span()[0]])
...     pos = m.span()[1]
...     
'<bpt i="1" type="1" x="1" />'
'<ept i="1" />'
'<ph x="2" type="2" />'
>>> pieces.append(line[m.span()[1]:])
>>> pieces
['', 'und ZF-Getriebe ', 'TipMatic ', 'Lite (&lt;/cf&gt;6AS850, 6AS800, 6AS1000)']
regular expressions (regex) save parts of sentence

regular expressions (regex) save parts of sentence


By : Andi
Date : March 29 2020, 07:55 AM
this one helps. You can use named groups within patterns to capture substrings, which makes referring to them easier and the code doing so slightly more readable:
code :
import re

data = ['Laura Compton, a Stock Broker from Los Angeles, California',
        'Miles Miller, a Soccer Player from Seattle, Washington']

pattern = (r'^(?P<name>[^,]+)\, an? (?P<position>.+) from '
           r'(?P<city>[^,]+)\, +(?P<state>.+)')

FIELDS = 'name', 'position', 'city', 'state'

for sentence in data:
    matches = re.search(pattern, sentence)
    name, position, city, state = matches.group(*FIELDS)
    print(', '.join([name, position, city, state]))
Laura Compton, Stock Broker, Los Angeles, California
Miles Miller, Soccer Player, Seattle, Washington
Related Posts Related Posts :
  • pretty printing numpy ndarrays using unicode characters
  • Frequent pattern mining in Python
  • How can I make a set of functions that can be used synchronously as well as asynchronously?
  • Convert one dice roll to two dice roll
  • count occourrence in a list
  • Writing an If condition to filter out the first word
  • to read file and compare column in python
  • Install python-numpy in the Virtualenv environment
  • `.select_by_visible_text()` is failed to select element?
  • Unable to send data multiple requests in a single connection — socket error
  • Pandas HDFStore unload dataframe from memory
  • Creating a custom admin view
  • How do you get the user role of the currently logged in user in Ckan?
  • Speed up Numpy Meshgrid Command
  • Python error - name lengths
  • appending text to a global variable
  • Python Mistake - Number of letters in name
  • Searching for a sequence in a text
  • Testing logging output with pytest
  • How do I change my default working directory for Python (Anaconda) on VSCode?
  • .lower() for x in list, not working, but works in another scenario
  • Program gives error "List indices must not be string"
  • pyqt: Memory Usage
  • Confused about classes in Learn Python the Hard Way ex43?
  • Extracting unrecognized information from many CSV files
  • How do I connect to Postgresql server from Python?
  • Append rows to a pandas DataFrame without making a new copy
  • Scrapy: Importing a package from the project that's not in the same directory
  • launching Excel application using Python to view the CSV file , but CSV file is opening in read mode and cant view the d
  • Making a list in user-defined functions
  • Pyserial microcontroller to host communication
  • Plotting a line in between subplots
  • function not returning value. Error "NameError: name 'urlss' is not defined"
  • How to perform cartesian product with Tensorflow?
  • Multiple independent random number streams from single seed
  • I Need a simple and short python3 code that count secounds in a background process
  • No module named constants
  • from django 1.4 to django 1.5- argument 'verify_exists' what s replacement?
  • Slash replacement inside a raw string
  • Reordering columns/rows of a pivot_table?
  • MySQLdb.cursors.Cursor.execute does not work
  • Python module being reimported when imported at different places
  • Is the Session object from Python's Requests library thread safe?
  • Python Regex: Finding First and Last Names
  • Order by selection in List view of OpenERP 7.0
  • Reading input values in ipython notebook
  • List of dictionaries - how to read a specific value in a dictionary
  • writing os.system output to file
  • Create dictionary from points list and multiple attribute lists
  • How to write a table line by line with for loop
  • Map projection and forced interpolation
  • Django FBV's "render_to_response" equivalent in Class-Based-View?
  • Paramiko raises "SFTPError: Garbage packet received"
  • python pandas operations on columns
  • python list appending is not working
  • Speeding up matplotlib scatter plots
  • For each element of the list find closest date from a different list
  • How to prepend new rows at the beginning of an existing csv file?
  • how to make database robust to process kills with sqlite postgress and sqlalchemy?
  • finding a set of ranges that a number fall in
  • shadow
    Privacy Policy - Terms - Contact Us © ourworld-yourmove.org