Package: RecordLinkage 0.4-12.6

RecordLinkage: Record Linkage Functions for Linking and Deduplicating Data Sets

Provides functions for linking and deduplicating data sets. Methods based on a stochastic approach are implemented as well as classification algorithms from the machine learning domain. For details, see our paper "The RecordLinkage Package: Detecting Errors in Data" Sariyar M / Borg A (2010) <doi:10.32614/RJ-2010-017>.

Authors:Murat Sariyar [aut, cre], Andreas Borg [aut]

RecordLinkage_0.4-12.6.tar.gz
RecordLinkage_0.4-12.6.zip(r-4.7)RecordLinkage_0.4-12.6.zip(r-4.6)RecordLinkage_0.4-12.6.zip(r-4.5)
RecordLinkage_0.4-12.6.tgz(r-4.6-x86_64)RecordLinkage_0.4-12.6.tgz(r-4.6-arm64)RecordLinkage_0.4-12.6.tgz(r-4.5-x86_64)RecordLinkage_0.4-12.6.tgz(r-4.5-arm64)
RecordLinkage_0.4-12.6.tar.gz(r-4.7-arm64)RecordLinkage_0.4-12.6.tar.gz(r-4.7-x86_64)RecordLinkage_0.4-12.6.tar.gz(r-4.6-arm64)RecordLinkage_0.4-12.6.tar.gz(r-4.6-x86_64)
RecordLinkage_0.4-12.6.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
RecordLinkage/json (API)

# Install 'RecordLinkage' in R:
install.packages('RecordLinkage', repos = c('https://sym33.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/sym33/recordlinkage/issues

Datasets:

On CRAN:

Conda:

8.60 score 8 packages 482 scripts 2.1k downloads 11 mentions 73 exports 56 dependencies

Last updated from:fb8c03fae4. Checks:4 ERROR, 2 OK, 7 NOTE. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64ERROR197
linux-devel-x86_64ERROR167
source / vignettesOK205
linux-release-arm64ERROR174
linux-release-x86_64ERROR140
macos-release-arm64NOTE150
macos-release-x86_64NOTE328
macos-oldrel-arm64NOTE170
macos-oldrel-x86_64NOTE264
windows-develNOTE135
windows-releaseNOTE143
windows-oldrelNOTE126
wasm-releaseOK150

Exports:[.RecLinkData[.RecLinkResult[.RLBigData[.RLResult%append%beginblockfldfunclassifySupvclassifyUnsupclearclonecompare.dedupcompare.linkagecountpatterndeleteNULLseditMatchemClassifyemWeightsepiClassifyepiWeightserrorMeasuresfsClassifyfsWeightsgenSamplesgetColumnNamesgetErrorMeasuresgetExpectedSizegetFalsegetFalseNeggetFalsePosgetFrequenciesgetMatchCountgetMinimalTraingetNACountgetNonMatchCountgetPairsgetPairsBackendgetParetoThresholdgetPatternCountsgetSQLStatementgetTablegetThresholdsgpdEsthasWeightsinit_sqlite_extensionsisFALSEjarowinklerlevenshteinDistlevenshteinSimloadRLObjectmakeBlockingPairsmrlmygllmnextPairsoptimalThresholdplotMRLprint.summaryRLBigDataDedupprint.summaryRLBigDataLinkageprint.summaryRLResultresampleRLBigDataDedupRLBigDataLinkagesaveRLObjectsoundexsplitDatasummary.RecLinkDatasummary.RecLinkResultsummary.RLBigDataDedupsummary.RLBigDataLinkagesummary.RLResulttexSummarytrainSupvunorderedPairs

Dependencies:bitbit64blobcachemclassclicodetoolscpp11data.tableDBIdiagramdigeste1071evdfarverfastmapfffuturefuture.applyggplot2globalsgluegtableipredisobandKernSmoothlabelinglatticelavalifecyclelistenvMASSMatrixmemoisennetnumDerivparallellypkgconfigprodlimprogressrproxyR6RColorBrewerRcpprlangrpartRSQLiteS7scalesshapeSQUAREMsurvivalvctrsviridisLitewithrxtable

Classes for record linkage of big data sets
Defining data and comparison parameters | Supervised classification | Weight-based classification | Evaluation and results

Last update: 2026-01-24
Started: 2026-01-24

Classifying record pairs by means of Extreme Value Theory

Last update: 2026-01-24
Started: 2026-01-24

Example Session for Supervised Classification
Generating comparison patterns | Training | Classification | Results | Rpart | Bagging | SVM

Last update: 2026-01-24
Started: 2026-01-24

Example session for Weight-based deduplication
Generating record pairs | Weight calculation | Classification

Last update: 2026-01-24
Started: 2026-01-24

Readme and manuals

Help Manual

Help pageTopics
Concatenate comparison patterns or classification results%append% %append%,RecLinkData,RecLinkData-method %append%,RecLinkResult,RecLinkResult-method %append%-methods
Supervised ClassificationclassifySupv classifySupv,RecLinkClassif,RecLinkData-method classifySupv,RecLinkClassif,RLBigData-method classifySupv-methods
Unsupervised ClassificationclassifyUnsup
Serialization of record linkage object.clone clone,RLBigData-method clone,RLResult-method clone-methods loadRLObject saveRLObject saveRLObject,RLBigData-method saveRLObject,RLResult-method saveRLObject-methods
Compare Recordscompare.dedup compare.linkage
Remove NULL ValuesdeleteNULLs
Edit Matching StatuseditMatch editMatch,RecLinkData-method editMatch,RLBigData-method editMatch-methods
Weight-based Classification of Data PairsemClassify emClassify,RecLinkData,ANY,ANY-method emClassify,RecLinkData,missing,missing-method emClassify,RLBigData,ANY,ANY-method emClassify,RLBigData,missing,missing-method emClassify,RLBigData-method
Calculate weightsemWeights emWeights,RecLinkData-method emWeights,RLBigData-method emWeights-methods
Classify record pairs with EpiLink weightsepiClassify epiClassify,RecLinkData-method epiClassify,RLBigData-method epiClassify-methods
Calculate EpiLink weightsepiWeights epiWeights,RecLinkData-method epiWeights,RLBigData-method epiWeights-methods
Class '"ff_vector"'ff_vector-class
Class '"ffdf"'ffdf-class
Generate Training SetgenSamples
Calculate Error MeasureserrorMeasures getErrorMeasures getErrorMeasures,RecLinkResult-method getErrorMeasures,RLResult-method getErrorMeasures-methods
Estimate number of record pairs.getExpectedSize getExpectedSize,data.frame-method getExpectedSize,RLBigDataDedup-method getExpectedSize,RLBigDataLinkage-method getExpectedSize-methods
Get attribute frequenciesgetFrequencies getFrequencies,RLBigData-method getFrequencies-methods
Create a minimal training setgetMinimalTrain getMinimalTrain,RecLinkData-method getMinimalTrain,RLBigData-method getMinimalTrain-methods
Extract Record PairsgetFalse getFalseNeg getFalsePos getPairs getPairs,RecLinkData-method getPairs,RecLinkResult-method getPairs,RLBigData-method getPairs,RLResult-method getPairs-methods
Estimate Threshold from Pareto DistributiongetParetoThreshold getParetoThreshold,RecLinkData-method getParetoThreshold,RLBigData-method getParetoThreshold-methods
Build contingency tablegetTable getTable,RecLinkResult-method getTable,RLResult-method getTable-methods
Estimate Threshold from Pareto DistributiongpdEst
Check for FALSEisFALSE
Generalized Log-Linear Fittingmygllm
Optimal Threshold for Record LinkageoptimalThreshold optimalThreshold,RecLinkData-method optimalThreshold,RLBigData-method optimalThreshold-methods
Phonetic Codephonetics soundex
Class "RecLinkClassif"RecLinkClassif RecLinkClassif-class
Class "RecLinkData"RecLinkData-class
Record Linkage Data ObjectRecLinkData RecLinkData.object
Class "RecLinkResult"RecLinkResult-class
Record Linkage Result ObjectRecLinkResult RecLinkResult
Record Linkage Result ObjectRecLinkResult RecLinkResult
Safe Samplingresample
Class "RLBigData"RLBigData-class
Constructors for big data objects.RLBigDataDedup RLBigDataLinkage
Class "RLBigDataDedup"RLBigDataDedup-class
Class "RLBigDataLinkage"RLBigDataLinkage-class
Test data for Record Linkageidentity.RLdata10000 identity.RLdata500 RLdata10000 RLdata500
Class "RLResult"RLResult-class
Show a RLBigData objectshow show,RLBigData-method
Split DatasplitData
Stochastic record linkage.fsClassify fsClassify,RecLinkData-method fsClassify,RLBigData-method fsClassify-methods fsWeights fsWeights,RecLinkData-method fsWeights,RLBigData-method fsWeights-methods
String Metricsjaro jarowinkler levenshtein levenshteinDist levenshteinSim strcmp winkler
Subset operator for record linkage objects[.RecLinkData [.RecLinkResult [.RLBigData [.RLResult
Print Summary of Record Linkage Datasummary.RecLinkData summary.RecLinkResult
summary methods for '"RLBigData"' objects.print.summaryRLBigDataDedup print.summaryRLBigDataLinkage summary.RLBigData summary.RLBigDataDedup summary.RLBigDataLinkage
Summary method for '"RLResult"' objects.print.summaryRLResult summary,RLResult-method summary.RLResult
Train a ClassifiertrainSupv
Create Unordered PairsunorderedPairs