Disclaimer

The words and opinions expressed here are those of each article's respective author, and do not necessarily represent the views of CapTech Ventures.

captech

Groupthink and the Agile Architect

Need uber-guru types who are willing to challenge the existing groupthink on design and architecture, especially on TDD and emergent design and pair programming anti-pattern” – job post at Monster.com 2/9/2010

I stumbled upon that quote following links on the role of the architect on an agile project. Maybe one important role of the architect is to help the team avoid groupthink.

On DW federation, whac-a-mole, and integrating business data

Information Management recently sent around their pick of best IM blog articles of 2009.  Among them was Forrester’s James Kobelius’s reaction to Bill Inmon’s “incineration of a straw man concept that he refers to as ‘virtual data warehousing (DW).’”

Hampton Roads .NET User Group November 2009 Presentation

I presented a talk called "Enterprise Data Validation" at the Hampton Road .NET User Group this evening. The premise was simple. Data validation needs to happen in all the tiers of a modern application but the validation rules should be defined only once to avoid synchronization errors. In this talk, I showed how to extend SQL Server using extended properties to store regular expressions for data validation as column metadata. I also showed how to add a regular expression matcher to SQL Server using the SQL CLR and how to add check constraints to invoke the regular expression parser. Then I built a WCF service to query the validation metadata to make it available in other application tiers. I quickly assembled WCF service host and client showed how you could bring all of the elements together to create a working Enterprise Validation solution.

Download the SQL Scripts (20.06 kb)

SQL UNIQUEIDENTIFIERs are Really Big Integers

I wrote a blog post called How SQL Server Sorts the UNIQUEIDENTIFIER Type and another one called Ordering the SQL UNIQUEIDENTIFIER Type Numerically Correct for Reporting a while back. As a result, I get a lot of e-mails from people struggling with UNIQUEIDENTIFIER values in Microsoft SQL Server. That's cool because I like helping other developers. The mistake that most people make when working with this data type is treating them like strings. However, UNIQUEIDENTIFIERS are absurd looking integers, really big ones. We show them in hexadecimal format to make them more compact which adds to their absurdness, I suppose.

Google Search Appliance (GSA) Sorting in Portal

At several of our clients, we have integrated the Google Search appliance into a Portal.  In order to accomplish this integration we could take 1 of 2 approaches:

1.     Utilize GSA’s built-in ability to format the presentation logic via a XLST.

Google Search Appliance (GSA) Sorting in Portal

At several of our clients, we have integrated the Google Search appliance into a Portal. In order to accomplish this integration we could take 1 of 2 approaches:

1. Utilize GSA’s built-in ability to format the presentation logic via a XLST.

2. Utilize GSA’s ability to return straight XML.

Both approaches work well and can suit the needs of a portal. Option 1 though will not work if you need to sort the entire result set prior to displaying it to the users. The reasons for this is as follows:

PyTip: Avoid Using range() for Large Sequences

When iterating over a sequence of numbers in Python, the range() function is commonly used. However, the implementation of the range() function in Python 2.x instantiates each element in the sequence before the iteration begins. This is really costly from both memory and CPU perspectives when the desired range of numbers is large. Consider using the xrange() function instead which implements a Python generator to yield each number in the sequence as needed. Using xrange() instead of range() for large iterations can have a big, positive impact on your code. For example, in an application I was working on recently, replacing range() calls with xrange() boosted my performance from ~900,000 transactions per second to over 3,000,000. In Python 3.x, the range() function is supposed to be implemented as a generator but I haven't tested that to be true yet. Let me know if you have.

Dynamic Language Runtime Performance Demos

I spoke at the Charlottesville .NET User Group this week and at the Raleigh Code Camp. I cheated and did the same presentation to both groups. Call me lazy but, in the middle of planning our own Code Camp in Richmond, I really didn't want to prepare two separate talks. I did a talk back at CodeStock 2009 on a similar topic back in June 2009 but it's evolved a lot since then based on my own growth and understanding. You can find the code and slides below.

Alfresco Integration with GSA

In order to provide searching within the portal a strategy had to be defined with how to integrate Alfresco with GSA. There were two approaches considered:

1. Utilize the traditional approach and have GSA crawl Alfresco through either a webscript mechanism or via CIFS.

2. Utilize the GSA Feed based approach.

After careful review we decided upon the feed base approach for the following reasons:

1. Meta Data: In order to support the Faceted searching, we need to find a way to attach metadata to each content item. Given that our HTML code is just snippets and does not contain a header with this information and that we are indexing documents, the only way to reliably accomplish this was via the feed.