May 2012
M T W T F S S
« Jul «-»  
 123456
78910111213
14151617181920
21222324252627
28293031  

Leaving Chris World?

Why not bring back a souvenir?

Archives

Retro Posts Processed

picture-2

It took a while but I managed to process my previous
journals into a wordpress database.
My program could only do so much for I still had to
go back through all the old files and pick apart and
rewrite the dates in places.

Basically, I had to debug my journal.

Retro Post Processing yielded about 1321 Posts
and that's up to December 28, 2007

so combined with 2008 and 2009 so far which was
approximately 283

I'd say I've written nearly
1604 posts since 1999

I've written nearly 1604 posts since the summer of 1999

And now for those whom are curious I wrote and used this
program to pull the posts from my simple text format.
(It consists of writing out the date before each entry.)
And then I can import it into wordpress via the RSS importer.

 
"""
Journal Rescue
 
Rough program I used to reform my old journals into a wordpress database
About as rough as one could ever imagine
Chris B Stones
 
As it happens in the predawn of the blogging age
people have kept journals in proto electronic forms and
sometimes get the crazy notion that they will want to
resurrect those old writings in the forms of publishment of
the day.
 
April 30, 2009
 
1 Splitting code
I borrowed some old code I wrote for the Aimee Parsing Script project
2 Recombingin code
  after some research I found wordpress can import RSS feed items into
  entries... All I have to do is reformat the old entries into the right
  xml type and va la. 
 
It's easier to va than la apparenlty.
 
This code assumes all time is just the day off
Tempoary change your wordpress timezone to UTC ONLY import then change it back
for this to work I think
 
This project took more than a day. I had to keep
coming back to the thing... argh. But I finally figured out
how to zip the stuff back together and made it all work.
"""
 
# damn the month of May it pops up in my poems and things so it breaks my parser
 
name = "jCollection.txt"
s = open(name,'r').read()
 
# Clean out weird charactors
s = s.replace("\t"," ")
s = s.replace("\r\t","\n")
s = s.replace("\r","\n")
s = s.replace("1999A","1999")
 
#returns number for the month-
def monthtonumber(monthstr):
	conv = {'January':1,'February':2,'March':3,'April':4,'May':5,'June':6,'July':7,'August':8,'September':9,'October':10,'November':11,'December':12,\
'Jan':1,'Feb':2,'Mar':3,'Apr':4,'Jun':6,'Jul':7,'Aug':8,'Sep':9,'Oct':10,'Nov':11,'Dec':12}
	return conv[monthstr]
 
import re
import datetime
from datetime import datetime
 
# debuging function
def show(word):
	for c in word:
		print c + " ",
	print
	for c in word:
		print str(ord(c))+" ",
	print
 
pstr = r'^January|^February|^March|^April|^May |^June |^July |^August|^September|^October|^November|^December'
r = re.compile(pstr,re.M)
 
# collect the date parts first 3 leters
prefixs = r.findall(s)
# collect bodies
bodys  = r.split(s)
bodys.pop(0) # some reason an extra empty string is in front
posts = []
postdict = {}
for a,b in zip(prefixs,bodys):
	posts.append(a+b)
	dateline = (a+b).split("\n")[0] #dateline
	dateparts = dateline.split(" ")
	# Month
	month = dateparts[0].replace(',','')
	# Day
	day = dateparts[1].replace(',','')
	if not day.isdigit():
		continue # prob didn't mean for a date string so SKIP THIS
	#Year
	year = dateparts[2]
 
	#remainder often a title of some sort or a time in place
	title = ' '.join(dateparts) # join the rest often time and a title
	#print year,month,day
	#print a+b
	print "======================= parse ====="
	print "month:",month
	print "day:",day
	print "year:",year
	print "----"
	print show(str(month)),show(str(day)),show(str(year))
	print "======================="
	d = datetime(int(year),monthtonumber(month),int(day))
	properdate = d.strftime("%a, %d %b %Y 00:00:00 GMT")
	print properdate
	postdict[properdate] = a+b
 
# dictionary of posts TO CHANNEL
 
# hard code channel data for now
rss_head = """<?xml version="1.0" encoding="UTF-8"?>
<!-- http://cyber.law.harvard.edu/rss/rss.html -->
<rss version="2.0">
<channel>
<title>Imported Posts</title>
<link>
http://welcometochrisworld.com</link>
<description>The Past journal</description>
<lastBuildDate>Thu, 30 Apr 2009 08:00:20 +0000</lastBuildDate>
<docs>http://backend.userland.com/rss092</docs>
<language>en</language>
"""
 
#return an item string
def item(title,entry,properdate):
	str = "<item>\n"
	str += "<title>"+title+"</title>\n"
	#str += "<description><![CDATA["+entry+"]]></description>\n"   <content:encoded>
	str += "<content:encoded><![CDATA["+entry+"]]></content:encoded>\n"
	str += "
<pubDate>"+properdate+"</pubDate>\n"
	#<guid>uinquestringforitem2</guid>
	str += "</item>\n\n"
	return str
 
rss_tail = """</channel></rss>"""
 
for datestamp,entry in zip(postdict.keys(),postdict.values()):
	rss_head += item(''.join(entry.split("\n")[0]),entry,datestamp)
 
rss_head += rss_tail
 
output = open("collection.rss",'w')
output.write(rss_head)
output.close()
 

You must be logged in to post a comment.