Building stuff…working with docx in Python…parsing data

I work professionally as a field case manager in the workers compensation realm. This isn’t a bad job, it’s a lot of travel and paperwork. I find myself constantly entering data into word documents and spreadsheets to track my billing etc. The company I work for is small and we don’t have a software package to make things any easier on us. I’ve been looking for something that will help me break into software development as my main source of income and as you’ve read, I am still learning a lot of the basics. I’ve got ideas on how to grow my knowledge and one of those is to build actual working projects. Projects need to solve a problem for someone, so why not start by solving my own problems?

My job involves receiving a ton of emails and the main source of information initially is sent to me in a word file. I’ve been playing around with writing a script that will take a docx file and parse it. These intial files are always the same as they are apparently based on a template that someone in our main office enters data into. The idea is to parse the data out into a database and start building a web application that I can use to track my activities and generate reports and billing documents from the app, rather than having to spend hours sifting through my emails and copying data. I can have my application send the emails and log the data in a report/billing document.

Seems easy enough, so…thats the plan. I’ve been using an app I found for Linux called Pencil to work up some wireframes of the screens so I can have a picture in my mind of what this will look like. More details to come…

Starting now with this library: http://python-docx.readthedocs.io

To get the data out of the initial documents…and will incorporate this into my actual flask application somehow.

Configuration settings

I realized today that I need to add a few features to my Flask Casts site and quickly ran into a couple of different issues. One, I never really considered making changes to the site once deployed. Being an extreme newbie shines through once again. During development I used the git features that are built into pycharm to push the project to GitHub. I work off a couple of different machines usually so this made things pretty nice. I could just click a couple buttons to pull the changes and push and done. There were a lot of things I hadn’t considered. One being the secret key needed for flask sessions. At the moment I have a different key on my production code than what is in the repo. I manually changed this via ssh and reloaded my application. This works but now when I want to push or pull to the production server I am getting things a little mixed up.

My fix is to set up an environment variable to hold the key itself… Of course it’s something I’ve never really done before but you learn something new every day.  I am having some difficulty getting this to work and simply don’t have the time to sit down and work on it tonight. An issue for tomorrow I suppose.

Deployed…a few days earlier than planned.

Turns out deploying the site was much easier than I had expected. For hosting I chose Digital Ocean based on the multitude of recommendations online plus I was able to get a couple of months free by using a referral link.

So far it is great! They have a ton of useful and informative articles to walk you through most anything you are going to do. First, I signed up on the $5 plan and got a droplet set up with Ubuntu 16.04.3. I used a few different articles to get the site off the ground. Starting with this one to get the server set up: https://www.digitalocean.com/community/tutorials/initial-server-setup-with-ubuntu-16-04

Basically this one walks you through creating a non-root user with sudo privileges, setting up SSH keys etc… then I went on to actually deploy my app…using this article:

https://www.digitalocean.com/community/tutorials/how-to-serve-flask-applications-with-gunicorn-and-nginx-on-ubuntu-16-04

This one walks through the process of installing everything and creating a basic flask app, at this step I actually cloned my repository instead…and then modified the rest of the tutorial based on my specific application.

Also had to install MongoDB to get the app to work correctly: https://www.digitalocean.com/community/tutorials/how-to-install-mongodb-on-ubuntu-16-04

I then add my Domain and set the nameservers with my registrar…added the @, WWW A records to point toward my domain…

Then I set up LetEncrypt for SSL, https://www.digitalocean.com/community/tutorials/how-to-secure-nginx-with-let-s-encrypt-on-ubuntu-16-04

All in all it only took about an hour…and it’s online. I did a basic video which I need to improve some of my setup but…my goal was to get this done by 9/6 and it looks like I made it with time to spare. https://www.flaskcasts.com

I am totally new to all of this so if anyone has any suggestions on how I can improve…I am all ears.

Learn by doing

It has become clear to me that the best way to learn something new is by getting your hands dirty and working on something. You can follow tutorials and watch someone else code a project, and follow along. You will definitely learn concepts, but the best way to learn this stuff is by tackling something on your own.  What kind of project should I build? You might ask.

I guarantee there is something that you have always thought would be a cool app or idea…even if it is something really small and trivial. Perhaps you just want a basic blog…which is what I am working on with FlaskCasts.

Today I worked out some login functionality using some of the concepts and methodology I picked up from a Udemy course I just took. This kept me from having to use some sort of extension that may provide way more functionality than I need. This also forced me to get my hands dirty and work out little bugs in the code as they came up.

Little bugs can be very frustrating and can somethings be disheartening. You should keep this in mind, from my experience 99% of the bugs you spend a ton of time on are so trivial that when you find them you feel stupid. One thing is for certain, you will likely remember that mistake going forward and will develop your code in such that you will not make the same mistake again. This might be new habits, or a change in the way you think about things….which the latter is most powerful.

For instance today I kept getting a 404 error on my login form…I couldn’t find the reason for the error…I realized that flasks request.form[‘variable’] is looking for the name=”variable” attribute in the form…not id=”variable”….stupid mistake, but I learned something new and will remember this going forward.  A lot of times it’s just small stuff like this that you learn from and in the end will make you a better developer.

Slugify

Today while working on the flask casts site, I decided it would be a good idea to generate a url slug. This is extremely important for search engine optimization. I thought about writing my own class to do this, then thought it might get ugly…so I searched for a package. I found Slugify: https://github.com/un33k/python-slugify

Pretty simple to use. When creating the Post object I just modified my __init__ to the following:

    def __init__(self, title, content, author, created=None):
        self.title = title
        self.slug = slugify(title)
        self.content = content
        self.author = author
        self.created = datetime.datetime.utcnow().strftime('%A %x @ %H:%M:%S') \
            if created is None else created

Now I can pass the slug to a url like: flaskcasts.com/post/this-is-the-post-title

@home.route('/post/<string:slug>')
def post(slug):
    post = Post.get_post(slug)
    author = User.get_user(post['author'])
    return render_template('home/post.html', post=post, author=author['fullname'])

Thanks for reading!

Pagination

So the concept escaped me for a couple of days. I actually had considered abandoning my MongoDB trials in favor of a SQL based system. What I discovered was that I am more persistent in finding a solution to my problem than I originally thought. I stumbled upon this Gist: https://gist.github.com/wonderb0lt/10645080

This is a very simple and basic Pagination class written for MongoEngine. Flask already has an extension for MongoEngine with this already built in. After looking at the code I realized all this is doing is passing an iterable to the Pagination class and then building links in the template. So I decided I would type it out line by line so I could get a better grasp on what was happening in the code.

So now I have Flask-PyMongo communicating with my database and was able to build the models in such a way that makes more sense to me at this stage of my learning.

For instance, my Post model looks like this:

class Post(object):


    def __init__(self, title, content, author, created=None):
        self.title = title
        self.content = content
        self.author = author
        self.created = datetime.datetime.utcnow().strftime('%A %x @ %H:%M:%S') \
            if created is None else created

    def __repr__(self):
        return "<Post {}>".format(self.title)

    def json(self):
        return {
            "title": self.title,
            "content": self.content,
            "author": self.author,
            "created": self.created
        }

    def save(self):
        mongo.db.posts.update({'title': self.title}, self.json(), upsert=True)

MongoEngine allows reference fields so you can actually store the author object in the post document…which is kind of cool becuase you could do something like this from the template post.author.full_name. So I was a little stuck on trying to find a way to do this using PyMongo until I read this post: https://docs.mongodb.com/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/

It explains a more efficient way of doing this. In my case I am referencing the user by the author variable, so I have set my User class up so that _id is a unique username and will reference the user by the username.

See the User class:

class User(object):

    def __init__(self, _id, email, password, fullname):
        self._id = _id # username, unique
        self.email = email
        self.password = password
        self.fullname = fullname

    def __repr__(self):
        return "<User {}>".format(self.fullname)

    def json(self):
        return {
            "_id": self._id,
            "email": self.email,
            "password": self.password,
            "fullname": self.fullname
        }

    @staticmethod
    def get_user(user_id):
        return mongo.db.users.find_one({"_id": user_id})

    def save(self):
        mongo.db.users.update({"_id": self._id}, self.json(), upsert=True)

So using the get_user() method I can get the fullname…

LEARNING A LOT! and having a lot of fun!

Also found this library…for generating fake data…VERY useful http://mimesis.readthedocs.io/en/latest/

Change of plans…sort of…kind of…

I am currently reconsidering my options as far as the database for FlaskCasts. As a newbie to Flask and Python, I am starting to see that MongoDB may be more challenging to use than I had initially thought. However, I’m not one to give up so soon. A couple of things I am confused about stand out:

  1. Implementing Pagination for the list of videos on the front page.
  2. Sorting those in descending order.

Coming from an SQL background this isn’t so complicated and in fact a lot of the SQLAlchemy based libraries have a pagination method built in.

Right now I am looking at the Flask-PyMongo extension to see if this will make things any easier. So far I have used plain PyMongo (on my blog attempt I mentioned earlier) and wrote a lot of the DB methods myself. I’ve already came across YouTube videos showing how to sort with this library so I think I may be able to use this and roll my own pagination.

After a little more research, I found Flask-MongoEngine. With just a little tinkering I was able to implement a very basic pagination. So I am pretty happy with that. This library seems to be well supported so I may just stick with this.

from flask import Flask, request, Blueprint, render_template
from flask_mongoengine import *
from .models import Post

app = Flask(__name__)
app.config.from_object('config')
db = MongoEngine(app)

@app.route('/')
def hello_world():
    page = int(request.args.get('page') or 1)
    posts = Post.objects.order_by('-created')
    paginated_posts = Pagination(posts, page=page, per_page=2)
    return render_template('index.html', paginated_posts=paginated_posts)

This is my view. Basically it checks to see if a page variable is being passed through. Gets the posts in descending order. Passes the posts to the Pagination class that is built in with the current page and showing only 2 entries per page. Then it renders the template with the paginated_posts.

{% for post in paginated_posts.items %}
    <h2>{{ post.title }}</h2>
    <p>{{ post.content }}</p>
    <p>Posted by: {{ post.author.fullname }}</p>
{% endfor %}

{# Macro for creating navigation links #}
{% macro render_navigation(pagination, endpoint) %}
  <div class=pagination>
  {% for page in pagination.iter_pages() %}
    {% if page %}
      {% if page != pagination.page %}
        <a href="{{ url_for(endpoint, page=page) }}">{{ page }}</a>
      {% else %}
        <strong>{{ page }}</strong>
      {% endif %}
    {% else %}
      <span class=ellipsis>…</span>
    {% endif %}
  {% endfor %}
  </div>
{% endmacro %}

{{ render_navigation(paginated_posts, 'hello_world') }}

I’d never really messed wtih a Macro in Jinja before this so I need to do a little more studying to get a grasp on what is happening here.  But…it works, and is much easier to roll out than anything else I think I could have done.

The models are a bit confusing as these classes inherit from the Document class…and I have not yet worked with anything like this, I’ve written my own classes. I am going to play around with the concepts a little more to get confortable. Here is what they look like right now:

import datetime
from mongoengine import *


class User(Document):
    email = EmailField()
    password = StringField()
    fullname = StringField()


class Post(Document):
    title = StringField()
    content = StringField()
    author = ReferenceField(User)
    created = DateTimeField(datetime.datetime.now)

Going to work on this some more tomorrow. Very happy to have solved this pagination issue however.

 

The one thing…

Had a long drive for work today. I usually listen to a lot of different podcasts on various topics. Lately my favorites are “Talk Python to Me” and the like… I listened “The SaaS podcast” because one of my goals from learning to code in Python is to develop a product, release it, and start earning supplemental income. Podcasts are great because not only will you learn things and absorb a lot of data, you find inspiration. The host kept talking about the book “The One Thing” by Gary Keller. Since I had 3 credits on Audible, I decided, why not! So far it is a GREAT book.

The main idea is to pick 1 thing, and focus on it with everything you have. Build habits by struggling with the difficulties of doing that 1 thing for 66 days on average and it becomes easy. So, I asked myself. What is the one thing that I can do today that will propel me toward my goal of becoming a successful freelancer and entrepreneur? The answer that came to mind is, Blog about coding and code. Talk about the things I am learning in a blog post and share with the world. Set aside some time to document, almost in a diary like format the things that I am working on and learning.

This will propel me in my coding career because the habit of blogging will force me to find things to write about. The process of finding things to write about will help me to learn new concepts and force me to produce code to share. Even on days when my normal job has sucked out all of my energy and I just don’t have it in me to write a line of code, I can talk about ways to find inspiration. I can talk about the feeling that comes from building something that works and solves a problem for someone. I can think about how satisfying it is to finally solve a bug in my code.

So, this is my blog. I am going to update this site regularly with information that will build confidence and keep me on track to success. I plan to work on some tutorials tomorrow and will write about my thoughts.

Writing a screencast blog with Flask

I read somewhere that the best way to learn a programming language, and to get better at coding is to work on projects. Open-source projects can be daunting for beginners so, it’s probably a good idea to roll your own apps for things you need personally. I really wanted to write my own blog but like I mentioned earlier, I set a deadline for the new moon/solar eclipse to launch, and I just couldn’t quite get the “pagination” working in time.

I got the idea to make a Blog for flask newbies. The idea was based on an old ruby-on-rails site I used to visit when I was trying to learn Ruby on Rails. The domain that I have purchased is flaskcast.com and basically all I’m going to do is write a simple blog website in flask that will display the YouTube videos embedded in the posts with a title and date posted. I may eventually add a discussion section for each video.

Tools
  • Python 3.6.2
  • Flask
  • Database (probably use MongoDB, as I am enjoying learning new stuff but may go with sqlite3 or postgres)
  • The Flask-MongoAlchemy extension. https://pythonhosted.org/Flask-MongoAlchemy/
  • Blueprints (to make the app modular, may need to scale it up at some point)
  • Digital Ocean for hosting
  • PyCharm
  • probably more…

The big issue I faced trying to write my own blog engine was that I had to write my own database module that utilized PyMongo. It was a learning experience, and in fact I got most of the code from a Udemy course that I had taken. I’m just not advanced enough right now to write my own pagination code that uses MongoDB. After looking at the docs for the MongoAlchemy extension, it looks like 90% of the code I had to write is already taken care of. This is copied from the docs:

from flask import Flask
from flask.ext.mongoalchemy import MongoAlchemy
app = Flask(__name__)
app.config['MONGOALCHEMY_DATABASE'] = 'library'
db = MongoAlchemy(app)

class Author(db.Document):
    name = db.StringField()

class Book(db.Document):
    title = db.StringField()
    author = db.DocumentField(Author)
    year = db.IntField()

from application import Author, Book
mark_pilgrim = Author(name='Mark Pilgrim')
dive = Book(title='Dive Into Python', author=mark_pilgrim, year=2004)

mark_pilgrim.name = 'Mark Stalone'
mark_pilgrim.save()
Flask-MongoAlchemy example

So as you can see, basically you create document objects and work with those. Looks like querying the database is really simple as well. My module required some pretty complicated RegEx that I am not comfortable writing without assistance.

Pagination appears to be pretty simple as well…

paginate(page, per_page=20, error_out=True)

Next post I will share some design ideas.

See you tomorrow!