A cleaner way of extracting block of text from file?
I’m writing some code to parse an nginx config file in Python. The goal is to extract all the upstream ‘pools’ and put them into a nice data structure for later use.
I’ve come up with the below solution but am unsure about my approach. This seems like something that should have already been done for some other application, I just couldn’t construct the right search terms to find what I need. I’m also wondering if there is some python trick that I am unaware of that could achieve what I want with less (perceived?) bloat.
Need:
- Extract every instance of ‘upstream’ in nginx config, make it useful.
Example data:
worker_processes 2;
pid /var/run/nginx.pid;
error_log /var/log/nginx/error_log debug;
debug_points abort;
events {
worker_connections 1024;
use epoll;
debug_connection 10.10.231.159;
}
http {
upstream pool1 {
server 10.10.240.48:8888;
server 10.10.231.159:8888;
}
upstream pool2 {
server 10.10.240.48:8889;
server 10.10.231.159:8889;
}
server {
listen 0.0.0.0:80;
access_log /var/log/nginx/access_log_80;
location /nginx_status {
stub_status on;
access_log off;
allow all;
}
location / {
proxy_pass http://pool1;
}
location /blah {
proxy_pass http://pool2;
}
}
}Example code (comments explain the logic):
class NgxConfig(object):
def __init__(self,logging,config_file):
# Set us up the bomb
self.logging = logging
self.upstreams = {}
try:
f = open(config_file)
self.config_file = [line.strip() for line in f.readlines()]
except IOError:
self.logging.error("Cannot process nginx config file!")
sys.exit("Cannot process nginx config file!")
f.close()
self.parse_upstreams()
def parse_upstreams(self):
""" Parse upstreams from config
"""
# Setup markers
us_start_matched = 0
us_end_matched = 0
# Enumerate over config to keep track of position
for pos,line in enumerate(self.config_file):
# See if our line matches "upstream" at all.
usm = re.search('^upstream([^"]+){',line)
if usm:
# Matched upstream, set our position and move on
us_start_matched = pos
continue
# We have a position set for upstream, look for the end of its block
if us_start_matched != 0 and line == '}':
# Got the end of the block
us_end_matched = pos
# Extract the name of the upstream
usm = re.search('^upstream\s+([^"]+)\s+{',self.config_file[us_start_matched])
# Setup list of upstreams
self.upstreams[usm.group(1)] = []
# Get the servers in the upstream between the start and end of the block
# Also remove needless characters, only need the server info
srvs = [s.strip('server; ') for s in self.config_file[us_start_matched+1:us_end_matched]]
# Set them in the list
self.upstreams[usm.group(1)] = srvs
# Reset position markers and move on
us_start_matched = 0
us_end_matched = 0
continue
pprint.pprint(self.upstreams)Result:
{'pool1': ['10.10.240.48:8888', '10.10.231.159:8888'],
'pool2': ['10.10.240.48:8889', '10.10.231.159:8889']}Thoughts, concerns, criticisms?
Thanks!
EDIT: May have found a bug in my blog software, seems that even though I had marked this as a ‘draft’ it was still publicly viewable via tag feed!
