Retro Posts Processed

It took a while but I managed to process my previous
journals into a wordpress database.
My program could only do so much for I still had to
go back through all the old files and pick apart and
rewrite the dates in places.
Basically, I had to debug my journal.
Retro Post Processing yielded about 1321 Posts
and that's up to December 28, 2007
so combined with 2008 and 2009 so far which was
approximately 283
I'd say I've written nearly
1604 posts since 1999
I've written nearly 1604 posts since the summer of 1999
And now for those whom are curious I wrote and used this
program to pull the posts from my simple text format.
(It consists of writing out the date before each entry.)
And then I can import it into wordpress via the RSS importer.
""" Journal Rescue Rough program I used to reform my old journals into a wordpress database About as rough as one could ever imagine Chris B Stones As it happens in the predawn of the blogging age people have kept journals in proto electronic forms and sometimes get the crazy notion that they will want to resurrect those old writings in the forms of publishment of the day. April 30, 2009 1 Splitting code I borrowed some old code I wrote for the Aimee Parsing Script project 2 Recombingin code after some research I found wordpress can import RSS feed items into entries... All I have to do is reformat the old entries into the right xml type and va la. It's easier to va than la apparenlty. This code assumes all time is just the day off Tempoary change your wordpress timezone to UTC ONLY import then change it back for this to work I think This project took more than a day. I had to keep coming back to the thing... argh. But I finally figured out how to zip the stuff back together and made it all work. """ # damn the month of May it pops up in my poems and things so it breaks my parser name = "jCollection.txt" s = open(name,'r').read() # Clean out weird charactors s = s.replace("\t"," ") s = s.replace("\r\t","\n") s = s.replace("\r","\n") s = s.replace("1999A","1999") #returns number for the month- def monthtonumber(monthstr): conv = {'January':1,'February':2,'March':3,'April':4,'May':5,'June':6,'July':7,'August':8,'September':9,'October':10,'November':11,'December':12,\ 'Jan':1,'Feb':2,'Mar':3,'Apr':4,'Jun':6,'Jul':7,'Aug':8,'Sep':9,'Oct':10,'Nov':11,'Dec':12} return conv[monthstr] import re import datetime from datetime import datetime # debuging function def show(word): for c in word: print c + " ", print for c in word: print str(ord(c))+" ", print pstr = r'^January|^February|^March|^April|^May |^June |^July |^August|^September|^October|^November|^December' r = re.compile(pstr,re.M) # collect the date parts first 3 leters prefixs = r.findall(s) # collect bodies bodys = r.split(s) bodys.pop(0) # some reason an extra empty string is in front posts = [] postdict = {} for a,b in zip(prefixs,bodys): posts.append(a+b) dateline = (a+b).split("\n")[0] #dateline dateparts = dateline.split(" ") # Month month = dateparts[0].replace(',','') # Day day = dateparts[1].replace(',','') if not day.isdigit(): continue # prob didn't mean for a date string so SKIP THIS #Year year = dateparts[2] #remainder often a title of some sort or a time in place title = ' '.join(dateparts) # join the rest often time and a title #print year,month,day #print a+b print "======================= parse =====" print "month:",month print "day:",day print "year:",year print "----" print show(str(month)),show(str(day)),show(str(year)) print "=======================" d = datetime(int(year),monthtonumber(month),int(day)) properdate = d.strftime("%a, %d %b %Y 00:00:00 GMT") print properdate postdict[properdate] = a+b # dictionary of posts TO CHANNEL # hard code channel data for now rss_head = """<?xml version="1.0" encoding="UTF-8"?> <!-- http://cyber.law.harvard.edu/rss/rss.html --> <rss version="2.0"> <channel> <title>Imported Posts</title> <link> http://welcometochrisworld.com</link> <description>The Past journal</description> <lastBuildDate>Thu, 30 Apr 2009 08:00:20 +0000</lastBuildDate> <docs>http://backend.userland.com/rss092</docs> <language>en</language> """ #return an item string def item(title,entry,properdate): str = "<item>\n" str += "<title>"+title+"</title>\n" #str += "<description><![CDATA["+entry+"]]></description>\n" <content:encoded> str += "<content:encoded><![CDATA["+entry+"]]></content:encoded>\n" str += " <pubDate>"+properdate+"</pubDate>\n" #<guid>uinquestringforitem2</guid> str += "</item>\n\n" return str rss_tail = """</channel></rss>""" for datestamp,entry in zip(postdict.keys(),postdict.values()): rss_head += item(''.join(entry.split("\n")[0]),entry,datestamp) rss_head += rss_tail output = open("collection.rss",'w') output.write(rss_head) output.close()