Author Archives: Eyasu Kifle
WordPress REST Authentication with Node

No matter what type of automation you are trying to add to your WordPress instance, it always starts with authentication. In my research, I found that there are a few ways to accomplish this within Node.js. Authenticating with WordPress REST was my goal, and using JWT seemed the logical way to do it. This npm package worked well. You will also need to enable the WordPress instance. This package adds robust WordPress REST routes and works like a charm.
Step 1: Enable HTTP Authorization Headers in .htaccess
Add the following to the .htaccess file between the IfModule tags:
RewriteEngine on RewriteCond %{HTTP:Authorization} ^(.*) RewriteRule ^(.*) – [E=HTTP_AUTHORIZATION:%1] |
Note: If you are using something like WP Engine, you will have some additional configuration.
Step 2: Install the JWT Plugin and Activate it
Go to your WordPress admin screen, then Plugins. A quick search for JWT should bring it up as the first choice: JWT Authentication for WP REST API. Enrique Chavez is the creator.
Step 3: Create a Reusable JS WordPress REST Authentication Library
You will need an HTTP agent such as Axios, SuperAgent, or Request. I used Axios here.
// WP.js // imports const axios = require(‘axios’); const WPAPI = require(‘wpapi’); // constants const BASE_URL = ‘https://somewordpresssite.com’; const WP_USER = ‘wp-admin’; const WP_PASS = ‘password’; // init const init = async () => { try { const authURL = `${BASE_URL}/jwt-auth/v1/token`; const wp = new WPAPI({ endpoint: BASE_URL }); // make a post request to get a new token const result = await axios.post(authURL, { username: WP_USER, password: WP_PASS, }); // destructure the token from the result const { data: { token } } = result; // set the Authorization header with the bearer token wp.setHeaders(‘Authorization’, `Bearer ${token}`); // await the user data and assign it to the wp object await wp.users().me(); // return the wp object return wp; } catch (e) { // some basic error handling console.error(‘Unable to authenticate with WordPress’, e.message); throw new Error(e); } }; module.exports = { init }; |
Step 4 – Authenticate with WordPress REST and Fetch Some Data
const { init } = require(‘./WP’); (async () => { const wp = await init(); const allCats = await wp.categories(); console.log(allCats); })(); |
Tutorial: IP Whitelisting for Docker Containers
You can use iptables
to restrict network access to an individual container without altering the Host’s rules or introducing external firewalls.
Why?
Some potential use cases:
- I have an app that’s not ready for production but needs to be tested on a production server.
- I have multi-tenancy in my SaaS and the customers want to restrict usage of a container just to their IP range.
- I host services (databases, proxies, cache…) that rely on IP-based whitelist authorization
How?
Install IP tables in a docker container and use the docker exec
command to alter the rules.
Here’s an example of IP based restriction for an Nginx container
Requirements
- A VPS ( DigitalOcean or Vultr work— Free $10 Credit) with Docker
- 2 different public IP addresses to test from. You can toggle a VPN or SSH into another machine you have.
Getting Started
I have a staging container on a shared VPS that should be accessed only from a range of IPs (say an office VPN )
We will call the machines and IPs as follows
A ) 222.100.100.100 — The VPS
B) 222.200.200.200 — Trusted Source IP
Step 1. Prepare your VPS
Install Docker and confirm that you can access your firewall is open. Some providers like EC2 require that you manually override Security Groups to allow external incoming traffic.
Launch a basic Nginx container that listens on all IP addresses (0.0.0.0) on port 8080
Using iptables within a container requires additional network capabilities forreasons stated here. So, we also add NET_ADMIN and NET_RAW capabilities.
docker pull nginx
docker run --cap-add=NET_ADMIN --cap-add=NET_RAW --name app1 -d -p 0.0.0.0:8080:80 nginx
Test that it works
curl http://222.100.100.100:8080
should print the Nginx welcome screen on all machines.
Step 2. Install IP Tables
Many Docker images come with iptables
pre-installed. Just in case, install it via exec:
docker exec app1 apt update
docker exec app1 apt install iptables -y
Verify it works:
docker exec app1 iptables -S
It should output:
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
Step 3. Block All Traffic
First, we block all traffic to the port that is bound in the container. Here, it is port 80, not 8080.
docker exec app1 iptables -A INPUT -p tcp --dport 80 -j DROP
Verify that it’s blocked
From an external machine, curl http://222.100.100.100:8080
should not work.
Step 4. Whitelist IPs
Then, we white list the trusted source IPs:
docker exec app1 iptables -I INPUT -p tcp --dport 80 --source 222.200.200.200 -j ACCEPT
Notice the -I flag which prepends to the rule file ensuring the white lists have higher precedence over the rule we added in Step 3.
curl http://222.100.100.100:8080
This will work only from the white listed IP. If you can access it from another source, it means something was misconfigured.
Step 5. Remove Whitelist IPs
To remove a whitelist, you can retrieve a list of all your rules:
docker exec app1 iptables -S
and then copy/paste the rule to the -D command, which drops it
docker exec app1 iptables -D INPUT -p tcp — dport 80 — source 222.200.200.200 -j ACCEPT
Step 6. Testing
You can use https://www.geoscreenshot.com to test HTTP access from many IPs.
Tutorial: Authenticating with WordPress REST API
I’ve used the WordPress REST API to run batch content migrations and run into frustrations with authenticating my API requests. By default, read-only operations are publicly accessible but write operations require authentication. The official docs only show cookie-based authentication, I will demonstrate 2 additional methods of authentication with the REST API.
Pre-requisites
- WordPress installed with Apache
- Access to WP CLI and .htaccess
This tutorial assumes that WordPress is running on http://localhost/
Method 1: Basic Auth Plugin
- Verify that REST API is running, this should return a 200 response.
curl -o -
localhost:8080/wp/v2/posts
2. Install the latest version
wp plugin install https://github.com/WP-API/Basic-Auth/archive/master.zip --activate
3. Modify htaccess
4. Test
Method 2: JWT Auth Plugin
- Install and activate
wp plugin install jwt-authentication-for-wp-rest-api --activate
2. Modify htaccess
3. Test
Integrating with node-wpapi
- Add a helper function
- Store JWT in header
A Case For Hiring Malleable Junior Developers

TL;DR
Some of the risks that are associated with hiring junior developers can be managed by selecting for personality traits. One of the important traits, Adaptability, manifests differently in junior developers. By identifying it in the hiring process, you may be able to mitigate some of your risks.
Intro
It is a tragedy that many junior developers have difficulty securing jobs. I have met many people who have earned their proper credentials and have an excellent grasp of technologies, but who cannot find their first job.
I have interviewed, mentored and hired junior developers with no experience. I have also hired boot camp graduates, computer science interns, and self-taught developers. And along the way, I have encountered a universal apprehension around hiring junior developers based on the assumption that they will break stuff, need hand-holding or they will leave early. However, I have found that junior developers are not a homogenous group and each developer comes with a unique risk profiles based on his or her personality.
My experience has convinced me that junior developers, if given the opportunity, can provide significant business value, and can occasionally outperform senior developers — often in unexpected ways. I speculate that one of the contributing factors is the quality of adaptability. Hiring team can mitigate the common fears that are associated with hiring juniors by selecting for moderate adaptability in their candidates.
What is a Junior Developer?
In my field, web development, seniority is disproportionately influenced by a single metric: the cumulative number of years of experience in a specific technology or role. The classification is vague, and varies depending on the company, but in general I have observed that people with two years of experience or less are considered junior, while people with seven years or more are considered senior. Everyone else falls somewhere in between the two.
For the purpose of this article, I will define “junior” as anyone with less than two years of professional work experience. This includes self-taught developers, interns, CS graduates, coding boot camp graduates, and those who are just starting out.
Some Concerns With Hiring Juniors
These are some of the common objections raised against hiring junior developers:
Net Negative Outcomes
Juniors can cause more harm than good.
Many companies operate with fragile systems and without proper safeguards. As a result, they fear that a junior developer might accidentally commit a “rookie mistake” that can bring down the production environment and cost many expensive hours of debugging.
But a responsible software company that has solid processes — such as code reviews, automated tests with reasonable code coverage, QA, etc. — would withstand any chaos introduced by inexperienced developers.
Tip: Hiring a junior developer would reveal the weakness of such systems and encourage robust development practices.
Guidance Overhead
Juniors need excessive hand-holding.
In many companies, documentation is lacking, as there is an assumption that code and architecture should be self-documenting. There is a concern that junior developers may steal productive hours from senior members by asking “obvious” questions.
But a responsible software company would consider documentation a requirement and would allocate time and budget for ensuring documentation maintenance.
Tip: Hiring a junior developer could reveal where documentation is lacking.
High Turnover Rate
Juniors will leave after a few years. Given the high demand for talent, a junior developer may leave the company for a more lucrative offer after investing so much in them.
But a responsible software company has a healthy flow of new and old employees. If your employees outgrow their positions, it means you are doing something right — and more employees will be attracted to you. The average turnover rate in the tech industry is actually lower for juniors than for seniors.
Tip: The average turnover rate in the software industry is very high. For the reasons mentioned in this article, juniors are less employable than more senior engineers. Perhaps, it seems riskier only because of the perceived loss of initial investment in training them.
Many of these problems are not caused by the addition of junior developers to a team. Rather, they reveal pre-existing systemic issues within the processes of a company. A responsible software company should be able to rotate staff without major interruptions to the product.
What is Malleability?
The qualities that are needed for a job depend largely on the size and culture of the company, the company’s short-term and long-term business goals, and the reason for hiring in the first place. For these reasons, it’s challenging to create a broad definition of “quality” for developers that is applicable for all teams and companies.
In my experience, the criteria have varied as follows:
- I have worked at companies that hired junior developers based mostly on their academic credentials, by evaluating either the prestige of their school or their standardized scores.
- I have also worked at companies that completely bypassed the standard HR process and only required that the applicant submit an essay about solving technical problems.
- A recent employer had a policy against hiring archetypal “brilliant jerks,” so we turned down highly technically-qualified candidates because their personality was not compatible with the ethos of the organization
Over the last 10 years, I have interviewed, mentored, and hired junior developers. And I have found that the quality that contributed most to their performance was neither their crystallized technical knowledge nor their intelligence, but rather their adaptability.
Adaptability is the ability to provide consistent performance in a changing environment. Extremely adaptable developers have the most career mobility but are rare and hard to retain. Inadaptable developers stagnate in a comfortable position in their career ladder and become outdated quickly. However, developers with average to above-average adaptability tend to do well in organizations.
I will focus on what I speculate is a major contributor to the general trait of adaptability: an individual’s openness to change.
Openness to Change
“Openness to change refers to an individual’s level of acceptance and conscious awareness of the possibility that change may be needed across a range of situations and scenarios, together with the appetite or drive to enact that change.” — Oxford Review:
Developers who are highly open to change thrive in a rapidly changing environment, and those who are less open to change thrive in a stable and predictable environment.
I am not aware of any personality test that evaluates this quality. The Big 5’s openness to experience may have some correlation with this trait, but I shy away from making an explicit connection. I speculate that this trait is normally distributed in the population. I’ve roughly divided the continuum into 3 broad categories: Dynamic developers who are highly open to change, Malleable developers who are moderately open to change and Static developers who are resistant to change.
Dynamic Developers
There are some really impressive developers who have many mastered technologies under their belt. These outliers generally do not have trouble finding jobs. Their abundance of extra-curricular research, side projects and open-source projects makes them attractive. Many of them also have the innate advantage of temperament: a knack for problem solving, creativity, and strong abstraction that helps them outperform their peers. And they usually progress up the career ladder quickly as a result. They are novelty-seeking, consistently self-educating, and have a genuine love for their work.
A possible disadvantage is that they tend to get bored easily and their productivity may decrease if the work they’re asked to perform is not particularly stimulating. They have high turnover rates, and many venture into consulting or entrepreneurship after a few years. Their archetypal challenge is wearing all hats on a single head; they are jacks-of-all trades and masters of many. I call them dynamic developers.
Static Developers
There is another group of developers I call static developers. They are generally attracted to job security and are able to work in specific technologies for a very long period of time. They tend to follow a predictable academic and career progression, and learn best through structured processes. Many of them become masters of a specific technology over many years, and establish a safe niche within an organization.
The distinguishing characteristic of this group is rigidity; they perform reliably within a fixed set of environmental parameters (for example, technologies or processes) but are slow to adapt to novel problems and spaces. Teams with many static developers find it easier to outsource projects to third parties or hire additional talent rather than train these developers. These developers are masters of one.
Malleable Developers
And then there is the final group of developers, who are what I call malleable developers. They are somewhere between static developers and dynamic developers in their openness to change. Like dynamic developers, they can improvise in novel problem spaces when needed and provide adequate solutions. But they also possess the same ability to find reward in stable, conscientious output that many static developers possess. Their mercurial nature makes them very desirable for startups and R&D teams, where requirements can shift unexpectedly and quick adaptation is a necessity. They are jacks-of-all-trades.
The Underlying Reward Mechanism
Each of these types also finds reward in different aspects of the job.
Dynamic developers are process-oriented and find reward in novelty and intellectual stimulation. They are content as long as they are presented with a stream of challenging tasks.
Static developers find pleasure in doing things slowly and correctly. They are goal-oriented and strive to acquire deep mastery over a specific technology. They are comfortable with a steady and predictable path. They find intrinsic reward in becoming a domain expert.
Malleable developers enjoy getting things done. They are results-oriented and enjoy solving external problems (be they simple bugs or business needs). They are comfortable performing both routine and challenging tasks to solve problems. They find reward in solving problems for stakeholders.
The Junior Manifestation
Now that I’ve established a general profile of these archetypes, let’s explore how they manifest in junior developers.
Junior + Dynamic
Dynamic junior developers initially present with more breadth than depth.
They are stimulated by learning new frameworks and acquiring new skills. They know the latest tools and frameworks, and even know upcoming features before they are released. They have many side projects and a full portfolio with various stacks and frameworks. In addition, they maintain a queue of articles, languages, challenges, etc… They are always learning during their free time.
It is likely that they are as impressive in-person as on paper. They can provide an intuitive explanation for solutions, despite not knowing the answer. Independence of mind provides the flexibility to make optimizations, unhindered by stiff thinking, while being framework agnostic. This adds a wider experience to draw on when suggesting creative ideas for improvement.
However, once hired, they may be easily bored by the menial tasks of coding.
Sifting through thousands of lines of old code to fix bugs seems like a daunting task. They may feel tempted by the impulse to re-write/re-factor a module the correct way, or even re-write a project using a more modern approach. After a while, they could begin to miss the “fun” of working on exciting public-facing projects, and may find themselves pigeon-holed.
If a dynamic junior developer is fortunate, managers will place them on interesting projects that keep them engaged. If not, they may become disillusioned and use the opportunity only as a stepping stone to a more senior position.
The archetypal vice for this type of dynamic junior developer is impatience.
They will rightly feel that the tasks given to them are beneath them, and may thus have trouble completing some of them. Even though their frustrations may be sound, they lack the relevant concrete experience to prioritize the issue with the needs of the business in mind. In addition, they may neglect to see the value of practices like testing, documentation, error handling, and logging, and instead focus only on the fun parts. As a result, overconfidence can introduce breaking changes, and some decisions may be myopic or too optimistic.
They will make the greatest contribution by suggesting ideas for improvements, introducing new processes, and doing peripheral projects in their own way. Since they naturally enjoy the challenge of learning, they will be content with a stream of challenging projects — as long as there is enough variety and stimulation.
Risk of breaking things: High
Need for hand-holding: Low
Risk of turnover: High
Junior + Static
The static junior developer is probably the most abundant type of junior developer.
They have usually completed some sort of structured training program, and have a typical portfolio. Since they tend to be perfectionists, their portfolios will feature only their best work (per objective standards) and may seem sparse. Even at the junior level, they may show an impressive level of understanding of a specific library or framework, but their breadth will be limited.
They go by the books and are generally technically comfortable with the roles for which they apply. Most impressive about them is their meta-awareness of what they know and what they don’t know. In interviews, they usually either give very lengthy academic answers or honestly admit that they don’t know something.
The static junior developer excels in team hierarchies where there is a chain of command and they know who to ask for help. They are comfortable starting out with fixing small bugs, writing documentation, testing, and other areas where their thoroughness provides value (and where they can experience progressive stages of mastery). They thrive in well-defined problem spaces where they are aware of what is expected of them.
Examples of web development tasks where static junior developers excel include: Fixing cross-browser compatibility issues, writing HTML email templates, adding unit tests, improving code coverage, fixing well-documented bugs, and translating designs into pixel-perfect CSS.
The archetypal challenge for these developers is stagnation. Unlike dynamic developers,they fear change and avoid the unknown. If they are thrown into a chaotic situation or are forced to improvise, they will stall or recuse themselves unless they are confident that they can contribute value.
Once a static junior developer knows that they are valued and sees a clear path of career progression, they become fiercely-loyal employees. As such, their turnover risk is low.
Risk of breaking things: Low
Need for hand-holding: High
Risk of turnover: Low
Junior + Malleable
Malleable junior developers resemble static developers in many ways. Given the limited years entailed by their junior status, they may not have had enough opportunities to exercise their dynamic side, so they may even be indistinguishable from static developers on paper. They may even appear less impressive than their static counterparts because they generally lack thematic coherence of tech skills (thus seeming like dilettantes).
However, they can be very impressive in interviews as they tend to display a remarkable technology agnosticism. Their willingness to bend to fit the problem makes them open to learning or using whatever is necessary to get the job done. This gives them the advantage of being able to separate the abstract problem definition from a particular technology implementation.
Malleable junior developers have the work ethic of static developers and the resourcefulness of dynamic developers. They often gravitate towards cross-functional roles, even early in their careers. They may even pick up non-technical tasks like data entry or customer support just to get things done. Given enough time, they learn new skills at a reasonable pace and can improvise with incomplete requirements.
The tragedy of the malleable junior developers is that they often grow to be the Swiss army knife of a team. Because of their flexibility and general willingness to do any work, they are often handed tasks indiscriminately which they will complete diligently. However, their selfless lack of preference, while beneficial to the team, can make them masters of none. With proper support and guidance, they can become highly-functional employees with an interesting mix of soft and hard skills.
Risk of breaking things: Low-Medium
Need for hand-holding: Low-Medium
Risk of turnover: Low-Medium
Identifying the Types
Once you have decided which type of junior developer your team needs, you have to identify the trait in your candidates. Unlike the Big5, MBTI, DISC… and other instruments, the personality theory I’ve presented lacks an empirical foundation (aside from my anecdotes). Therefore, there is not (yet) an objective tool to discern between these types.
However, there are some speech and thought pattern “tell-tale” signs that can be seen in interviews.
In Interviews
The dynamic developer, owing to their lateral thinking tendency will infinitely expand possibilities (breadth) if given the time while the static developer will reduce possibilities to a few certain outcomes (depth). The malleable developer will reconcile these opposing tendencies within his or herself by making heavy use of abstractions.
Here’s an example:
Question: For project ACME, how would you persist the data given the following constraints? [Constraints]
Dynamic: Depending on [factors], I would use either this set of NoSQL databases [a long impressive list] or one of these RDBMS [another long impressive list] because they offer XYZ. However, I’m only familiar with Mongo… although I’ve also dabbled a bit in CouchDB and Redis. I’m really comfortable using any database. Did you hear about the new Neo4J feature that lets you…
Static: I would write a MySQL query, first making sure the data is sanitized and ensuring the syntax is compatible with MySQL 5.7. I would use ABC ORM and do it as [such and such]. I am familiar with most of the SQL syntax of MySQL, as well as the recent changes in 5.7. I don’t really know about other databases. I plan to take a course on other databases in a few months.
Malleable: Depending on the [factors], I would use your supported database engine with the needed features and work with your existing ORM to persist the data. I am familiar with some RDBMS and NoSQL databases, but I am willing to learn other type of databases if needed.
The Case for the Malleable Junior Developer
A malleable junior developer can provide the immediate value that a dynamic developer provides, without the high risk of turnover.
A malleable junior developer can also provide the productivity that a static developer provides, without the need for hand-holding, stagnation or becoming outdated.
For these reasons, I argue that hiring a junior developer with moderate openness to change provides a balanced long-term outcome.
The Case for the Other Types
Your immediate needs and the nature of your company should determine which type of developer your hire as there is a specific niche where each type excels.
For example,
If your team requires highly specialized knowledge in a specific well-defined area, a static developer can provide strong business value given that you allocate time to train them.
If you work in a highly volatile environment or have a lot of quick ephemeral projects (prototypes, MVPs, internal tools), a dynamic developer will get you there given that you give them creative freedom and autonomy.
Research: The Web Proxy Authentication Problem
One of the security concerns of deploying private proxies is ensuring that anonymous crawlers and port scanner do not gain access to it. Many proxy providers mandate that their users manually white-list their IPs to get access to the proxy but few users have stable static IPs. Other providers support username/password authentication but many clients such as Google Chrome do not natively support this scheme.
I researched a few open-source solutions for bridging this gap, here were my findings:
Problems
- I am trying to use paid proxy X but it requires either IP-based or HTTP Proxy-Auth based authentication, I want a proxy with no auth requirements that works well with browsers.
Projects Explored
Glider
This was my first choices. It’s lightweight, supports round robin rotation between upstream proxies and supports many protocols.
Doesn’t support Auth proxies yet https://github.com/nadoo/glider/issues/15
Go Proxy
I have this running in production and it works well. Very easy to install and seems to support many protocols. Unfortunately, the documentation isn’t clear on upstream proxies.
Filed an issue here https://github.com/snail007/goproxy/issues/112
MITM Proxy
This is the only proxy I got to work with an HTTP upstream. Unfortunately, it injects it’s own certs and doesn’t support non-intercepting proxying for upstreams.
Squid
Squid is heavy, taking ~180MB of memory on my local machine so it probably will not scale for me. Officially, it supports proxy chaining and others have reported success with it. My attempts haven’t worked, it ignores the cache peer
rules and connects directly.
Privoxy
Also claims to support authenticated proxies officially. I haven’t gotten it past DNS resolution for an non-authenticated upstream proxy.
3Proxy
Claims to support proxy chaining with username/password authentication out of the box. The documentation isn’t clear on credential config. Issues here https://github.com/z3APA3A/3proxy/issues/165https://github.com/z3APA3A/3proxy/issues/102 https://github.com/z3APA3A/3proxy/issues/20
Tutorial: Importing Content to WordPress

In situations where I have had to migrate content from a static platform (static content in code, older CMS…) to WordPress. Some of the annoying issues were:
- Unsanitary HTML, extracting content from shell
- Adding template tags and breadcrumbs
- Manual copy/paste labor
The Solution
- Use JavaScript to sanitize the DOM
- Write RegExp rules
- Scrape the via Node.JS and import directly into WP using the JSON API
The Stack
- Node 6+ for cleaner JS
- JSDOM for Headless Scraping -npm module
- WordPress JSON API -wp plugin
- WordPress Client -npm module
Steps
- List out the URLs of the existing pages you want to migrate. If there is significant variance in the DOM structure between these pages, group them by template so you can process them easier. If you would like to add additional data, a page title override, custom fields…, include them.
Export the list to a CSV You should have a source.csv with something like:
url, title, description, template, category, meta1...
https://en.wikipedia.org/wiki/Baldwin_Street, Baldwin Street\, Dunedin, A short suburban road in Dunedin, New Zealand, reputedly the world's steepest street., Asia and Oceania
...
2. Get WP Ready
- On your WP installation, install and activate this plugin [WP REST API](https://wordpress.org/plugins/rest-api/)
- Upload and unzip the Basic Auth plugin, it is not in the Plugin repo at the time of this writing. https://github.com/eventespresso/Basic-Auth
- Since we use Basic Auth, create a temporary user/pass that can be discarded after import.
- Test if the API works, navigate to
{baseurl}/wp-json
. You should see a JSON response with your site’s info - Add the following to .htaccess to enable Basic Auth:
RewriteRule ^index\.php$ — [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
Verify with cURL:
curl -u username:password -i -H 'Accept:application/json' {baseurl}/wp-json/wp/v2/users/me
It should display your user information. If this doesn’t work, check Apache global config for restrictions on overrides. And if that doesn’t work, there are other methods of authentication here: https://developer.wordpress.org/rest-api/using-the-rest-api/authentication/
The App Logic
- Connect with WP and Authenticate/Authorize
- Parse the CSV into a JS form
{url:..., title:..., description:...}
- For each URL, scrape with JSDOM and extract custom fields
- Extract the body, sanitize HTML
- Insert new post/page via the WP REST APIThe Code
Example
For this example, I will scrape the Bhagavad Gita from Wiki Source and populate each page
env.js — Environment Config
For the sake of simplicity, I am not using OAuth but Basic HTTP Authentication. This method demands that credentials are transmitted in plain text over the wire. Use a temporary account and ensure HTTPS is enabled. DO NOT check this into source control with actual WP credentials.
// These data shouldn't be checked in. module.exports = { 'WP_URL': 'https://domain.com', 'WP_USERNAME': 'test', 'WP_PASSWORD': 'test' }
data/source.csv — List of URLs
I will use a single column CSV, you can pass metadata by adding more columns.
url https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_1 https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_2 https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_3 https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_4 https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_5 https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_6 https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_7 https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_8
lib/Read.js — Reads CSV Files
For most use cases, this crude CSV parser would suffice:
const fs = require('fs'); try { let raw = fs.readFileSync('./data/source.csv', 'utf8') let parsed = raw.split("\n") // Rows .map(r => r.split(",") // Fields .map(f => f.trim('\r\n') )) // Trailing chars } catch (e){ console.error(e); }
But synchronous processing does not scale well (say 1,000,000 rows) , so we’ll use Streams which are most robust. The fast CSV module has built-in support for Node streams. The following code is a starter for a scalable solution:
const csv = require('fast-csv'),
fs = require('fs');
class List {
constructor(filePath, limit = 500) {
this.filePath = filePath || null;
this.limit = limit;
this.data = [];
this.stream = null;
}
read() {
return new Promise((resolve, reject) => {
if (!(this.filePath && fs.existsSync(this.filePath))) {
return reject('File does not exist');
}
// TODO: impement scalable streaming.
this.stream = fs.createReadStream(this.filePath);
this.stream.pipe(csv()).on("data", (raw) => {
if (this.data.length > this.limit) {
console.log("Read", "Limit exceeded");
return this.stream.destroy();
}
this.data.push(raw);
}).on("end", () => {
resolve(this.data)
});
})
}
}
module.exports = {
List
};
testRead.js — Checkpoint: Verify the CSV is read
const { List } = require ('./lib/read');
let file = new List('./data/source.csv');
file.read().then(console.log, console.error);
Run testRead.js, you should see a 2D array of your CSV.
lib/API.js — WP API Wrapper
This file wraps around the wpapi npm module to provide support authentication and provide only the functions we need: new post and new page
/*
* Wrapper around WP-API
*/
const env = require('../env');
const WPAPI = require('wpapi');
class API {
constructor () {
this.wp = null;
this.user = null;
}
addPost(title, content, category, meta, type='posts', status='draft') {
return new Promise((resolve, reject) => {
this.wp.posts().create({
title,
content,
status
}).then(function( response ) {
resolve(response.id);
}, reject);
});
}
addPage(title, content, category, meta, type='posts', status='draft') {
return new Promise((resolve, reject) => {
this.wp.pages().create({
title,
content,
status
}).then(function( response ) {
resolve(response.id);
}, reject);
});
}
initialize() {
return new Promise((resolve, reject) => {
if (!this.wp)
{
let config = {
endpoint: `${env.WP_URL}/wp-json`,
username: env.WP_USERNAME,
password: env.WP_PASSWORD,
auth: true
}
this.wp = new WPAPI(config)
// Verify that it authenticated
this.wp.users().me().then((user) => {
this.user = user;
console.log('API', 'authenticated as', user.name);
resolve(user);
}, (error) => reject(error))
}
else
{
reject ("API already initialized");
}
});
}
}
module.exports = { API };
testAPI.js — Checkpoint: Verify the WP connects
const { API } = require ('./lib/api'); let api = new API(); api.initialize().then(console.log, console.error);
Run testAPI.js, you should see a JSON with your user details.
lib/Scrape.js — Headless Webpage Scraper
This module wraps around JSDOM for extensibility. You can swap it with other libraries (for ex. Cheerio, X-Ray, Phantom…). The fnProcess argument to the constructor expects a function that takes a window object as input and returns a parsed JSON. We are including jQuery for convenience.
const jsdom = require('jsdom');
class Scrape {
constructor(url, fnProcess = null, libs = []) {
this.url = url || null;
this.libs = [...["http://code.jquery.com/jquery.js"], libs];
this.fnProcess = (typeof fnProcess === 'function') ? fnProcess : function(window) {
return window.document.body.innerHTML;
}
this.output = null;
}
scrape() {
return new Promise((resolve, reject) => {
jsdom.env(this.url, ["http://code.jquery.com/jquery.js"], (err, window) => {
if (err) {
return reject(err);
}
this.output = this.fnProcess(window);
resolve(this.output);
});
});
}
}
module.exports = {
Scrape
}
testScrape.js — Checkpoint, it should scrape example.com
const { Scrape } = require ('./lib/scrape'); let page = new Scrape('http://example.org/', function (window) { return {title: window.title, body: window.jQuery('p').text()} }) page.scrape().then(console.log, console.error);
Run testAPI.js, you should see a JSON with your user details.
index.js — The Glue
Now that we’ve tested these components individually, it is time to glue them up. Async is a popular library for managing control flow in Node applications. This is the code version of the logic mentioned above.
The Scrape Function
Scrape the fields we want from WikiSource:
// Scrape function to be executed in DOM
const fnScrape = function(window) {
// From
// The Bhagavad Gita (Arnold translation)/Chapter 1
// To
// Chapter 1
let $ = window.jQuery;
let title = $('#header_section_text').text().replace(/["()]/g, "");
body = $('.poem').text()
return {
title,
body
};
}
I tested / fine-tuned this in Chrome DevTools. You should run this test against your source URLs to make sure you account for page variations. This is run in the fake browser context.
The entire file:
const async = require('async');
const { List } = require('./lib/read'),
{ Scrape } = require('./lib/scrape'),
{ API } = require('./lib/api');
const csvFilePath = './data/source.csv',
LIMIT_PARALLEL = 5;
// Step 1 - Init WP
let api = new API();
// Step 2 - Read CSV
const readTheFile = function() {
let file = new List(csvFilePath);
console.log('Reading file...');
return file.read();
};
// Step 3 - Process multiple URLs
const processPages = function(data) {
data.shift(); // CSV header
console.log('Processing', data.length, 'pages');
async.forEachLimit(data, LIMIT_PARALLEL, processSingle, (err)=>{
if (err)
{
return console.error(err);
}
console.log("Done!");
});
};
// Step 4 - Get a JSON version of a URL
const scrapePage = function(url) {
return new Promise((resolve, reject) => {
if (url.indexOf('http') !== 0) {
reject('Invalid URL');
}
let page = new Scrape(url, fnScrape);
page.scrape().then((data) => {
console.log(">> >> Scraped data", data.body.length);
resolve(data);
}, (err) => reject);
});
};
// Scrape function to be executed in DOM
const fnScrape = function(window) {
// From
// The Bhagavad Gita (Arnold translation)/Chapter 1
// To
// Chapter 1
let $ = window.jQuery;
let title = $('#header_section_text').text().replace(/["()]/g, ""),
body = $('.poem').text()
return {
title,
body
};
}
// Step 3 - Get a JSON version of a URL
const processSingle = function(data, cb) {
let [url] = data;
console.log(">> Processing ", url);
scrapePage(url).then((data) => {
// Step 5 - Add page to WordPress
api.addPage(data.title, data.body).then((wpId) => {
console.log(">> Processed ", wpId);
cb();
}, cb)
}, cb);
}
// Kick start the process
api.initialize()
.then(readTheFile, console.error)
.then(processPages, console.error);
console.log('WP Auth...');
Output
...
>> Processed 140
>> Processing https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_12
>> >> Scraped data 12634
>> Processed 141
>> Processing https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_13
>> Processed 142
>> Processing https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_14
>> Processed 143
>> Processing https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_15
>> >> Scraped data 3005
>> >> Scraped data 3706
>> >> Scraped data 5297
>> >> Scraped data 4039
>> Processed 144
>> Processing https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_16
>> >> Scraped data 3835
>> Processed 145
>> Processing https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_17
>> >> Scraped data 3781
>> Processed 146
>> Processing https://en.wikisource.org/wiki/The_Bhagavad_Gita_(Arnold_translation)/Chapter_18
>> >> Scraped data 11816
>> Processed 147
>> Processed 148
>> Processed 149
>> Processed 150
>> Processed 151
Done!
Check your WP for the new content.