Useful Bioinformatics One-liners
Bioinformatics work often involves small file transformations: counting reads, checking sequence lengths, converting between FASTQ and FASTA, joining tables, filtering BLAST output, or extracting a region from a sequence.
This post is a practical collection of shell one-liners for those jobs.
Most examples use awk, sed, grep, sort, cut, paste, xargs, and a few common bioinformatics tools such as samtools. Some commands require GNU awk (gawk). On Ubuntu, install it with:
sudo apt install gawk
A few rules before using these commands:
- For serious analysis, prefer dedicated tools. Use
seqkit,bioawk,samtools,bcftools,seqtk,csvkit, or small Python/R scripts when correctness matters more than convenience. - Many FASTA commands assume single-line FASTA. Convert multi-line FASTA to single-line first.
- Test on a tiny file before running on a large dataset. One-liners are easy to mistype.
- Use
zcatfor compressed files andcatfor uncompressed files. Replace as needed. - Watch memory usage. Commands that store sequences in arrays are not suitable for huge files.
Quick Setup
Use these helpers mentally throughout the post:
# compressed FASTQ
zcat reads.fq.gz
# uncompressed FASTQ
cat reads.fq
# compressed FASTA
zcat contigs.fa.gz
# uncompressed FASTA
cat contigs.fa
For reproducible command behavior, especially with sorting:
export LC_ALL=C
FASTQ
FASTQ records have four lines:
@read_id
ACGT...
+
IIII...
Most FASTQ one-liners use NR % 4:
NR % 4 == 1: headerNR % 4 == 2: sequenceNR % 4 == 3: plus lineNR % 4 == 0: quality line
Count reads
zcat reads.fq.gz | awk 'END {print NR/4}'
Count bases and reads
zcat reads.fq.gz | awk 'NR%4==2 {bp+=length($0); reads++} END {printf "reads\t%d\nbases\t%d\nGb\t%.3f\n", reads, bp, bp/1e9}'
Read length distribution
zcat reads.fq.gz | awk 'NR%4==2 {len[length($0)]++} END {for (l in len) print l, len[l]}' | sort -n
Filter reads by minimum length
zcat reads.fq.gz | awk -v minlen=100 'NR%4==1 {h=$0} NR%4==2 {s=$0} NR%4==3 {p=$0} NR%4==0 {q=$0; if (length(s)>=minlen) print h"\n"s"\n"p"\n"q}' > filtered.fq
Convert FASTQ to FASTA
zcat reads.fq.gz | awk 'NR%4==1 {sub(/^@/, ">"); print} NR%4==2 {print}' > reads.fa
Interleave paired-end reads
Input:
R1.fq.gz
R2.fq.gz
Output order:
R1_record_1
R2_record_1
R1_record_2
R2_record_2
...
paste <(zcat R1.fq.gz) <(zcat R2.fq.gz) \
| paste - - - - \
| awk 'BEGIN {FS="\t"; OFS="\n"} {print $1,$3,$5,$7,$2,$4,$6,$8}' \
> interleaved.fq
Deinterleave paired-end reads
paste - - - - - - - - < interleaved.fq \
| tee >(cut -f1-4 | tr '\t' '\n' > R1.fq) \
| cut -f5-8 | tr '\t' '\n' > R2.fq
Deduplicate single-end reads by sequence
This keeps the first occurrence of each sequence. It does not compare quality strings.
zcat reads.fq.gz | awk 'NR%4==1 {h=$0} NR%4==2 {s=$0} NR%4==3 {p=$0} NR%4==0 {q=$0; if (!seen[s]++) print h"\n"s"\n"p"\n"q}' > dedup.fq
Deduplicate paired-end reads by read-pair sequence
paste <(zcat R1.fq.gz) <(zcat R2.fq.gz) \
| paste - - - - \
| awk 'BEGIN {FS="\t"; OFS="\n"}
{
key=$2"\t"$6
if (!seen[key]++) {
print $1,$2,$3,$4 > "dedup_R1.fq"
print $5,$6,$7,$8 > "dedup_R2.fq"
}
}'
Split FASTQ into N round-robin files
This distributes complete records across part_1.fq, part_2.fq, etc.
awk -v n=4 'NR%4==1 {rec=$0; for(i=2;i<=4;i++){getline; rec=rec"\n"$0}; file="part_" ((++c-1)%n+1) ".fq"; print rec >> file}' reads.fq
Shuffle FASTQ
Suitable for small to medium files. It stores records in memory.
awk -v seed=1 'BEGIN{srand(seed)} NR%4==1 {h=$0; getline s; getline p; getline q; rec[++n]=h"\n"s"\n"p"\n"q} END {for(i=1;i<=n;i++) idx[i]=i; for(i=n;i>1;i--){j=int(rand()*i)+1; t=idx[i]; idx[i]=idx[j]; idx[j]=t}; for(i=1;i<=n;i++) print rec[idx[i]]}' reads.fq > shuffled.fq
Subsample single-end FASTQ using reservoir sampling
Keeps k reads without loading the whole file into memory.
awk -v k=10000 -v seed=3 'BEGIN{srand(seed)} NR%4==1 {h=$0; getline s; getline p; getline q; rec=h"\n"s"\n"p"\n"q; i++; if(i<=k){a[i]=rec} else {j=int(rand()*i)+1; if(j<=k) a[j]=rec}} END {for(i in a) print a[i]}' reads.fq > subsampled.fq
Subsample paired-end FASTQ using reservoir sampling
This assumes both files have matching read order.
paste <(cat R1.fq) <(cat R2.fq) \
| paste - - - - \
| awk -v k=10000 -v seed=3 'BEGIN{srand(seed); FS="\t"}
{
rec1=$1"\n"$2"\n"$3"\n"$4
rec2=$5"\n"$6"\n"$7"\n"$8
i++
if(i<=k){a[i]=rec1; b[i]=rec2}
else {j=int(rand()*i)+1; if(j<=k){a[j]=rec1; b[j]=rec2}}
}
END {for(i in a){print a[i] > "subsampled_R1.fq"; print b[i] > "subsampled_R2.fq"}}'
Convert FASTQ to unmapped BAM
Use this only when you need a quick unmapped BAM from raw reads.
zcat reads.fq.gz \
| awk 'NR%4==1 {h=$0; sub(/^@/, "", h)} NR%4==2 {s=$0} NR%4==0 {q=$0; print h"\t4\t*\t0\t0\t*\t*\t0\t0\t"s"\t"q}' \
| samtools view -bS - > reads.unmapped.bam
FASTA
Many FASTA commands below assume one sequence per line:
>seq1
ACGTACGTACGT
>seq2
TTTTCCCCAAAA
Convert multi-line FASTA to single-line FASTA
awk '/^>/ {if (seq) print seq; print; seq=""; next} {seq=seq toupper($0)} END {if (seq) print seq}' input.fa > singleline.fa
Remove Windows carriage returns
Useful when sequences were copied from a browser or edited on Windows.
sed 's/\r//g' input.fa > clean.fa
Convert single-line FASTA to wrapped FASTA
Set width to the desired line length.
awk -v width=60 '/^>/ {if(seq){while(length(seq)>width){print substr(seq,1,width); seq=substr(seq,width+1)}; print seq}; print; seq=""; next} {seq=seq $0} END {if(seq){while(length(seq)>width){print substr(seq,1,width); seq=substr(seq,width+1)}; print seq}}' singleline.fa > wrapped.fa
Count sequences
grep -c '^>' file.fa
Get sequence lengths
awk '/^>/ {id=$0; sub(/^>/,"",id); getline seq; print id"\t"length(seq)}' file.fa
Get total assembly size and number of sequences
awk '/^>/ {getline seq; sum+=length(seq); n++} END {printf "file\t%s\nsequences\t%d\nbases\t%d\nMb\t%.3f\n", FILENAME, n, sum, sum/1e6}' file.fa
Filter sequences by minimum length
awk -v minlen=1000 '/^>/ {h=$0; getline seq; if(length(seq)>=minlen) print h"\n"seq}' file.fa > filtered.fa
Sort sequences by length, longest first
Requires GNU awk for PROCINFO["sorted_in"].
awk 'BEGIN{PROCINFO["sorted_in"]="@ind_num_desc"} /^>/ {h=$0; getline seq; a[length(seq)][h"\n"seq]} END {for(l in a) for(r in a[l]) print r}' file.fa > sorted.fa
Find shortest, longest, and mean sequence length
awk '/^>/ {getline seq; len=length(seq); n++; sum+=len; if(n==1 || len<min) min=len; if(len>max) max=len} END {printf "min\t%d\nmax\t%d\nmean\t%.2f\n", min, max, sum/n}' file.fa
Get GC content for the whole file
awk '/^>/ {getline seq; total+=length(seq); gc+=gsub(/[GgCc]/,"",seq)} END {printf "GC_percent\t%.3f\n", 100*gc/total}' file.fa
Get GC content per sequence
awk '/^>/ {id=$0; sub(/^>/,"",id); getline seq; len=length(seq); tmp=seq; gc=gsub(/[GgCc]/,"",tmp); printf "%s\t%.3f\n", id, 100*gc/len}' file.fa
Reverse complement each sequence
awk 'function revcomp(s, i,b,o){for(i=length(s);i>0;i--){b=substr(s,i,1); o=o ((b=="A")?"T":(b=="T")?"A":(b=="G")?"C":(b=="C")?"G":(b=="a")?"t":(b=="t")?"a":(b=="g")?"c":(b=="c")?"g":"N")} return o} /^>/ {h=$0; getline seq; print h"\n"revcomp(seq)}' file.fa > rc.fa
Extract a 1-based region from one sequence
Set id, start, and end.
awk -v id="seq1" -v start=100 -v end=200 '/^>/ {h=$0; clean=h; sub(/^>/,"",clean); getline seq; if(clean==id) print h":"start"-"end"\n"substr(seq,start,end-start+1)}' file.fa
Extract and reverse-complement a region
awk -v id="seq1" -v start=100 -v end=200 '
function revcomp(s, i,b,o){for(i=length(s);i>0;i--){b=substr(s,i,1); o=o ((b=="A")?"T":(b=="T")?"A":(b=="G")?"C":(b=="C")?"G":"N")} return o}
/^>/ {h=$0; clean=h; sub(/^>/,"",clean); getline seq; if(clean==id){subseq=substr(seq,start,end-start+1); print h":"start"-"end"_rc\n"revcomp(subseq)}}' file.fa
Deduplicate sequences by header
awk '/^>/ {keep=!seen[$0]++} keep' file.fa > dedup_by_header.fa
Deduplicate sequences by sequence content
awk '/^>/ {h=$0; getline seq; if(!seen[seq]++) print h"\n"seq}' file.fa > dedup_by_sequence.fa
Remove singleton sequences
Keeps sequences that appear more than once.
awk '/^>/ {h=$0; getline seq; rec[seq]=rec[seq] h"\n"seq"\n"; count[seq]++} END {for(seq in count) if(count[seq]>1) printf "%s", rec[seq]}' file.fa > non_singletons.fa
Rename FASTA headers and create an index table
awk '/^>/ {h=$0; getline seq; old=h; sub(/^>/,"",old); id=sprintf("seq_%06d", ++n); print ">"id"\n"seq > "renamed.fa"; print id"\t"old > "renamed_index.tsv"}' file.fa
Compute N50, L50, and auN
Requires GNU awk for asort.
awk '/^>/ {getline seq; len[++n]=length(seq); total+=len[n]; sumsq+=len[n]^2}
END {
asort(len)
target=total/2
for(i=n;i>=1;i--){cum+=len[i]; if(cum>=target){n50=len[i]; l50=n-i+1; break}}
printf "N50\t%d\nL50\t%d\nauN\t%.2f\n", n50, l50, sumsq/total
}' file.fa
Convert GenBank to FASTA
This is useful for quick inspection. For production parsing, prefer Biopython.
awk '/ACCESSION/ {id=$2} /ORIGIN/ {seq=""; while((getline)>0){if($0~/\/\//) break; gsub(/[0-9 ]/,"",$0); seq=seq toupper($0)}} END {print ">"id"\n"seq}' file.gb > file.fa
K-mers and Sequence Patterns
Count k-mers in a FASTA file
Set k as needed.
awk -v k=11 '/^>/ {getline seq; for(i=1;i<=length(seq)-k+1;i++) count[substr(seq,i,k)]++} END {for(kmer in count) print kmer"\t"count[kmer]}' file.fa | sort -k2,2nr
Count k-mers per sequence
awk -v k=11 '/^>/ {id=$0; sub(/^>/,"",id); getline seq; delete count; total=0; for(i=1;i<=length(seq)-k+1;i++){count[substr(seq,i,k)]++; total++}; for(kmer in count) print id"\t"kmer"\t"count[kmer]"\t"count[kmer]/total}' file.fa
Find k-mers unique to one sequence
awk -v k=11 '/^>/ {id=$0; sub(/^>/,"",id); getline seq; for(i=1;i<=length(seq)-k+1;i++) seen[substr(seq,i,k)][id]=1} END {for(kmer in seen) if(length(seen[kmer])==1) for(id in seen[kmer]) print id"\t"kmer}' file.fa
Find reverse-complement palindromic k-mers
Use an even k.
awk -v k=6 '
function rc(s, i,b,o){for(i=length(s);i>0;i--){b=substr(s,i,1); o=o ((b=="A")?"T":(b=="T")?"A":(b=="G")?"C":(b=="C")?"G":"N")} return o}
/^>/ {getline seq; for(i=1;i<=length(seq)-k+1;i++){x=substr(seq,i,k); if(x==rc(x)) count[x]++}}
END {for(x in count) print x"\t"count[x]}' file.fa
Find longest homopolymer stretch per sequence
awk '
function longest_run(s, i,curr,max,char,start,best_char,best_start){curr=1; max=1; best_char=substr(s,1,1); best_start=1; for(i=2;i<=length(s);i++){if(substr(s,i,1)==substr(s,i-1,1)){curr++} else {if(curr>max){max=curr; best_char=substr(s,i-1,1); best_start=i-curr}; curr=1}} if(curr>max){max=curr; best_char=substr(s,length(s),1); best_start=length(s)-curr+1}; return best_char"\t"best_start"\t"max}
/^>/ {id=$0; sub(/^>/,"",id); getline seq; print id"\t"longest_run(seq)}' file.fa
Detect CpG islands using a simple sliding-window rule
This follows the common Gardiner-Garden and Frommer-style idea: window size, GC percentage, and observed/expected CpG ratio.
awk -v window=200 -v oe=0.6 -v gcmin=50 '
function scan(id,s, i,w,subseq,c,g,cg,gc,ratio){
s=toupper(s)
for(i=1;i<=length(s)-window+1;i++){
subseq=substr(s,i,window)
c=gsub(/C/,"",subseq)
subseq=substr(s,i,window)
g=gsub(/G/,"",subseq)
subseq=substr(s,i,window)
cg=gsub(/CG/,"",subseq)
gc=100*(c+g)/window
ratio=(c*g>0 ? cg/(c*g/window) : 0)
if(gc>=gcmin && ratio>=oe) print id"\t"i"\t"i+window-1"\t"gc"\t"ratio
}
}
/^>/ {id=$0; sub(/^>/,"",id); getline seq; scan(id,seq)}' file.fa
Tables and Text Files
These examples assume tab-separated files unless noted otherwise.
CSV to TSV
For simple CSV only. Quoted CSV is harder than it looks; use csvkit or Python for complex CSV files.
awk 'BEGIN{FPAT="([^,]*)|(\"([^\"]|\"\")*\")"; OFS="\t"} {for(i=1;i<=NF;i++){if($i~/^\"/){$i=substr($i,2,length($i)-2); gsub(/\"\"/,"\"",$i)}}; print}' file.csv > file.tsv
TSV to CSV
awk 'BEGIN{FS="\t"; OFS=","} {for(i=1;i<=NF;i++){if($i~/[,"]/){gsub(/"/,""""",$i); $i="\""$i"\""}}; print}' file.tsv > file.csv
Remove one column
Remove column 2:
awk -v col=2 'BEGIN{FS=OFS="\t"} {for(i=1;i<=NF;i++) if(i!=col) printf "%s%s", $i, (i<NF?OFS:ORS)}' table.tsv
Swap two columns
awk -v a=1 -v b=2 'BEGIN{FS=OFS="\t"} {tmp=$a; $a=$b; $b=tmp; print}' table.tsv
Transpose a table
Suitable for small to medium tables.
awk 'BEGIN{FS=OFS="\t"} {rows=NR; if(NF>cols) cols=NF; for(i=1;i<=NF;i++) a[NR,i]=$i} END {for(i=1;i<=cols;i++){for(j=1;j<=rows;j++) printf "%s%s", a[j,i], (j<rows?OFS:ORS)}}' table.tsv
Sum rows, excluding the first column
awk 'BEGIN{FS=OFS="\t"} NR>1 {sum=0; for(i=2;i<=NF;i++) sum+=$i; print $1,sum}' table.tsv
Sum columns, excluding the first row and first column
awk 'BEGIN{FS=OFS="\t"} NR==1{next} {for(i=2;i<=NF;i++) sum[i]+=$i} END {for(i=2;i<=length(sum)+1;i++) printf "%s%s", sum[i], (i<length(sum)+1?OFS:ORS)}' table.tsv
Collapse duplicate IDs by summing values
awk 'BEGIN{FS=OFS="\t"} NR==1{header=$0; next} {ids[$1]=1; for(i=2;i<=NF;i++) sum[$1,i]+=$i; nf=NF} END {print header; for(id in ids){printf "%s", id; for(i=2;i<=nf;i++) printf "%s%s", OFS, sum[id,i]; printf ORS}}' table.tsv
Outer join multiple two-column tables
Missing values become 0.
awk 'BEGIN{FS=OFS="\t"} {value[$1][ARGIND]=$2; keys[$1]=1; files[ARGIND]=FILENAME} END {printf "ID"; for(i=1;i<=ARGIND;i++) printf OFS files[i]; print ""; for(k in keys){printf k; for(i=1;i<=ARGIND;i++) printf OFS ((i in value[k])?value[k][i]:0); print ""}}' table*.tsv
Right join two tables by first column
Appends column 2 from left.tsv to right.tsv.
awk 'BEGIN{FS=OFS="\t"} FNR==NR {a[$1]=$2; next} {print $0, (($1 in a)?a[$1]:0)}' left.tsv right.tsv
Count values within groups
awk 'BEGIN{FS=OFS="\t"} {count[$1,$2]++} END {for(k in count){split(k,a,SUBSEP); print a[1],a[2],count[k]}}' table.tsv
Most frequent value per group
awk 'BEGIN{FS=OFS="\t"} {count[$1,$2]++; if(count[$1,$2]>max[$1]) max[$1]=count[$1,$2]} END {for(k in count){split(k,a,SUBSEP); if(count[k]==max[a[1]]) print a[1],a[2],count[k]}}' table.tsv
Print a file in reverse order
awk '{a[NR]=$0} END {for(i=NR;i>=1;i--) print a[i]}' file.txt
Join all lines with commas
paste -sd, file.txt
Split a file into N roughly equal parts
awk -v n=4 '{line[NR]=$0} END {chunk=int((NR+n-1)/n); for(i=1;i<=NR;i++){part=int((i-1)/chunk)+1; print line[i] > "part_"part".txt"}}' file.txt
BLAST Output
These examples assume BLAST tabular output with fields similar to:
qseqid sseqid pident length qlen slen evalue bitscore mismatch gapopen qstart qend sstart send
Adjust column numbers if your outfmt 6 is different.
Keep best hit per query by percent identity and query coverage
awk 'function abs(x){return x<0?-x:x} BEGIN{FS=OFS="\t"} {qcov=abs($12-$11)+1; qcov=qcov*100/$5; score=$3"\t"qcov; if($3>best_p[$1] || ($3==best_p[$1] && qcov>best_q[$1])){best_p[$1]=$3; best_q[$1]=qcov; row[$1]=$0}} END {for(q in row) print row[q]}' blast.tsv
Filter hits by percent identity and query coverage
awk -v pident=95 -v qcov=90 'function abs(x){return x<0?-x:x} BEGIN{FS=OFS="\t"} {cov=(abs($12-$11)+1)*100/$5; if($3>=pident && cov>=qcov) print}' blast.tsv
Get query-subject pairs passing thresholds
awk -v pident=95 -v qcov=90 'function abs(x){return x<0?-x:x} BEGIN{FS=OFS="\t"} {cov=(abs($12-$11)+1)*100/$5; if($3>=pident && cov>=qcov) print $1,$2}' blast.tsv
Group homolog-like pairs into connected components
This is a quick graph clustering trick. For large files, use Python, R, or a graph library.
awk 'BEGIN{FS=OFS="\t"} {parent[$1]=$1; parent[$2]=$2; edge[++n]=$1 SUBSEP $2}
function find(x){while(parent[x]!=x)x=parent[x]; return x}
function unite(a,b){ra=find(a); rb=find(b); if(ra!=rb) parent[rb]=ra}
END{for(i=1;i<=n;i++){split(edge[i],e,SUBSEP); unite(e[1],e[2])}; for(x in parent){root=find(x); group[root]=group[root]" "x}; c=0; for(r in group) print ++c, group[r]}' pairs.tsv
GFF, GTF, and GenBank
Quick GFF to rough GTF conversion
This is only a rough conversion. Real GFF/GTF conversion can be messy. Prefer gffread when possible.
gffread input.gff -T -o output.gtf
If you only need a quick AWK-based conversion for simple files:
awk 'BEGIN{FS=OFS="\t"} /^#/ {next} {id=""; name=""; split($9,a,";"); for(i in a){if(a[i]~/^ID=/){id=a[i]; sub(/^ID=/,"",id)}; if(a[i]~/^Name=/){name=a[i]; sub(/^Name=/,"",name)}}; $9="gene_id \""id"\"; transcript_id \""id"\"; gene_name \""name"\";"; print}' input.gff > rough.gtf
Extract CDS translations from GenBank
For robust parsing, use Biopython. This AWK version is for quick inspection.
awk '/\/protein_id=/ {protein=$0; gsub(/.*protein_id="|"/,"",protein)} /\/translation=/ {seq=$0; gsub(/.*translation="|"/,"",seq); while(seq !~ /"/ && getline){seq=seq $0}; gsub(/"| /,"",seq); if(protein) print ">"protein"\n"seq}' file.gb > proteins.fa
Parallel Processing
Run a command across many files with xargs
Example: count reads in many compressed FASTQ files using four parallel workers.
find . -name "*.fq.gz" -print0 \
| xargs -0 -P 4 -I {} bash -c 'printf "%s\t" "{}"; zcat "{}" | awk "END{print NR/4}"'
Process paired-end files by sample prefix
Assumes files are named sample_R1.fq.gz and sample_R2.fq.gz.
find . -name "*_R1.fq.gz" -print0 \
| xargs -0 -P 4 -I {} bash -c '
r1="{}"
r2="${r1/_R1.fq.gz/_R2.fq.gz}"
sample=$(basename "$r1" _R1.fq.gz)
paste <(zcat "$r1") <(zcat "$r2") | paste - - - - \
| awk "BEGIN{FS=\"\\t\"; OFS=\"\\n\"} {print \$1,\$3,\$5,\$7,\$2,\$4,\$6,\$8}" \
> "${sample}.interleaved.fq"
'
Random Data Generation
Generate random nucleotide FASTA
awk -v seed=3 -v n=100 -v len=1000 'BEGIN{srand(seed); split("A C G T",base," "); for(i=1;i<=n;i++){seq=""; for(j=1;j<=len;j++) seq=seq base[int(rand()*4)+1]; printf ">seq_%06d\n%s\n", i, seq}}' > random.fa
Generate random amino-acid FASTA
awk -v seed=3 -v n=100 -v len=300 'BEGIN{srand(seed); split("A C D E F G H I K L M N P Q R S T V W Y",aa," "); for(i=1;i<=n;i++){seq="M"; for(j=2;j<=len;j++) seq=seq aa[int(rand()*20)+1]; printf ">protein_%06d\n%s\n", i, seq}}' > random_proteins.fa
Generate random FASTQ
Requires GNU awk with chr() support.
awk -v seed=3 -v n=100 -v len=150 'BEGIN{srand(seed); split("A C G T",base," "); for(i=1;i<=n;i++){seq=""; qual=""; for(j=1;j<=len;j++){seq=seq base[int(rand()*4)+1]; qual=qual sprintf("%c", int(rand()*40)+33)}; printf "@read_%06d\n%s\n+\n%s\n", i, seq, qual}}' > random.fq
Practical Recommendations
For everyday bioinformatics command-line work, I now prefer this order:
- Use
seqkitorseqtkfor FASTA/FASTQ operations. - Use
samtools,bcftools, andbedtoolsfor standard genomics formats. - Use
awkfor quick table operations and lightweight inspection. - Use Python/R when the logic needs names, tests, or reuse.
- Save complicated one-liners into scripts.
A good one-liner should be easy to read, easy to test, and easy to throw away.