Introduction

SnpTracker is a Java-based tool developed to extract the latest version rsID and genomic coordinates of SNPs given any version of rs ID(s) according to the SNP track history RsMergeArch (Version: b151;hg19/hg38) [help], coordinates data SNPChrPosOnRef (hg19/hg38) [help] and deleted history SNPHistory(hg19/hg38) [help] in dbSNP.

Cite me

Deng JE, Sham PC, Li MX. SNPTracker: A Swift Tool for Comprehensive Tracking and Unifying dbSNP rs IDs and Genomic Coordinates of Massive Sequence Variants. G3 (Bethesda). 2015 Nov 19;6(1):205-7. PubMed G3

Snptracker Application
Type File Version
 MS Windows  /   Mac  / Linux snptracker.zip v0.1
 Example data four example datasets inside snptracker.zip
 Source code src.zip v0.1

 Update:
  21/08/2015
  • Add automatically converting coordinates of hg17 and hg18 to hg19;
  •   06/04/2015
  • Add options to set resource files;
  •  GitHub:
      https://github.com/limx54/SNPTracker

    Installation

    System requirement

    The Java Runtime Environment(JRE) version 6.0 is requried for snptracker. It can be downloaded from Java Web Site.Details of the installation on Windows, Linux or Mac can be found on Java Help.

    Installing snptracker

    Simply decompress the archive and run the following command

     java -Xmx4g -jar ./snptracker.jar [arguments (See Examples below)]

    Input files

    The only input is a text file containing information of SNPs. Each SNP has a row identified by the rsID, e.g., 'rs123' and multiple fields are delimited by tabs or spaces. The text file can be compressed in gzip format.

    A input exmaple:

    SNP P-value1 P-value2 P-value3
    rs1513559 0.02301 0.8605 0.007688
    rs294755 0.4384 0.9575 0.006112
    rs835316 0.002688 0.007688 0.4893

    Outputs

    There are two output files. 1)*.result.txt contains the original input SNP information and the newly tracted genomic coordinate and id information. 2) *.error.txt contains SNPs of unsuccessful unique mapping in dbSNP,as the result of the following reasons:

  •   I.Deleted: variants are withdrawn by submitter request;
  •  II.Invalid: variants that is reported as an invalid snp_id in dbsnp database;
  • III.DuplicatedRs: variants are merged into a same RS ID which exists in the input file;
  • IV.Some snps are with special position information, including:
       chr_AltOnly:   variants that map to non-reference (alternative) assemblely
       chr_Multi:     variants that map to multiple contigs
       chr_NotOn:   variants that did not map to any current chromosome
       chr_Un:      mapped variants that are on unplaced chromosomes
  •  V.NoChrPosOnRef: variants that is without position information in the SNPChrPosOnRef.bcp file.
  • Options

    Options Description
    --in General format input file containing RS IDs or positions information of variants
    --rsid Tracking by RS ID, the column indicated variants RS IDs information in a file specified by --in. The default setting is --rsid 1.
    --chr Tracking by coordinates, the column indicated chromosome information in a file specified by --in. The default setting is --chr 1.
    --pos Tracking by coordinates, the column indicated coordinate information in a file specified by --in. The default setting is --pos 2.
    --ref Setting the version of human reference genome,including hg19 and hg38. If the coordinates of hg17 and hg18 are input, SNPTracker automatatically connvert the coordinates into hg19. The default setting is --ref hg19.
    --bim-file Plink Bim format input file
    --map-file Plink Map format input file
    --by-id Variants are tracked by RS ID information in a file specified by --bim-file or --map-file.
    --by-pos Variants are tracked by position information in a file specified by --bim-file or --map-file.
    --out Setting the prefix of the output files. The default setting is --out snptracker.

    Specify the local resource datasets
    Options Description
    --merge-file The absoluted path of RsMergeArch.bcp.gz file which includes variants merged record.
    The default setting is:
     1)hg19: --merge-file ./resource/b142_GRCh19.RsMergeArch.bcp.gz
     2)hg38: --merge-file ./resource/b142_GRCh38.RsMergeArch.bcp.gz
    The file can be downloaded from dbsnp ftp website.
    --coor-file The absoluted path of SNPChrPosOnRef.bcp.gz file which includes variants coordinates information.
    The default setting is:
     1)hg19: --coor-file ./resource/b142_SNPChrPosOnRef_GRCh19p105.bcp.gz.
     2)hg38: --coor-file ./resource/b142_SNPChrPosOnRef_GRCh38p106.bcp.gz.
    The file can be downloaded from dbsnp ftp website.
    --hist-file The absoluted path of SNPHistory.bcp.gz file which includes variants deleted record.
    The default setting is:
     1)hg19: --hist-file ./resource/b142_GRCh19.SNPHistory.bcp.gz
     2)hg38: --hist-file ./resource/b142_GRCh38.SNPHistory.bcp.gz
    The file can be downloaded from dbsnp ftp website.

    Contact

    If you have any question about SnpTracker, please write an email to Dr Miaoxin LI by limx54@yahoo.com or Miss Jiaen DENG by silviakt@hku.hk

    Examples

    I. options without double strikes for general format file (.txt or .gz)

     java -Xmx4g -jar ./snptracker.jar input.txt  java -Xmx4g -jar ./snptracker.jar input.txt output.prefix  java -Xmx4g -jar ./snptracker.jar input.txt 1 output.prefix  java -Xmx4g -jar ./snptracker.jar input.txt hg38 output.prefix  java -Xmx4g -jar ./snptracker.jar input.txt 1 hg38 output.prefix

    *Note:If set options without double strikes (e.g. --in), please set options by following order.
        (a) input file    < REQUIRED >
        (b) rsid column  < default: 1 >
        (c) reference  < default: hg19 >
        (d) output    < default: snptracker >
        *Resource datasets are default setting.
         If you want to change the resource datasets, please set options with double strikes.
    II. options with double strikes

    (I). tracking by RS IDs java -Xmx4g -jar ./snptracker.jar --in input.txt --rsid 1 --ref hg38 --out output.prefix(II). tracking by coordinates java -Xmx4g -jar ./snptracker.jar --in input.txt --chr 1 --pos 2 --ref hg38 --out output.prefix(III). tracking by a map/bim file java -Xmx4g -jar ./snptracker.jar --map-file input.map --by-id --ref hg38 --out output.prefix java -Xmx4g -jar ./snptracker.jar --bim-file input.bim --by-pos --ref hg38 --out output.prefix(IV). specify resource datasets for I, II, III ... --merge-file /path/to/resource/RsMergeArch.bcp.gz \ --coor-file /path/to/resource/SNPChrPosOnRef.bcp.gz \ --hist-file /path/to/resource/SNPHistory.bcp.gz

    *Note:
  • If run the software first time, please set option without --no-web.
       All resource files will download automatically and save in ./resources folder.
  • If want to switch off auto-download resources from web server, use --no-web option.

  •  Miao-xin Li, Jia-En Deng, Center for Genomic Sciences & Department of Psychiatry, The University of Hong Kong, All rights reserved.